mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Snakemake pipeline is not in the right order #368

Open AmitMY opened 9 months ago

AmitMY commented 9 months ago

I have no idea how to fix this, any help or at least guidance is appreciated.

And here is my current log for a new job. It seems to be trying to Collecting translated mono src dataset before training a model, or running inference on it, which leads to 0 translations

Running with config configs/config.spoken-to-signed.yml and profile local
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Provided resources: gpu=4
Singularity containers: ignored
Job stats:
job                    count    min threads    max threads
-------------------  -------  -------------  -------------
alignments                 1             40             40
all                        1              1              1
ce_filter                  1             40             40
clean_corpus               3             40             40
clean_mono                 2             40             40
collect_corpus             1              4              4
collect_mono_src           1              4              4
collect_mono_trg           1              4              4
copy_backtranslated        1              4              4
download_corpus            5              1              1
download_mono              2              1              1
eval_quantized             1              1              1
evaluate                   6              8              8
experiment                 1              1              1
export                     1              1              1
finetune_student           1              8              8
merge_corpus               1             40             40
merge_devset               1             40             40
merge_mono                 2             40             40
merge_translated           1              4              4
quantize                   1              1              1
score                      1              8              8
split_corpus               1              1              1
split_mono_src             1              1              1
split_mono_trg             1              1              1
train_backward             1              8              8
train_student              1              8              8
train_teacher              2              8              8
train_vocab                1              2              2
total                     44              1             40

Select jobs to execute...

[Fri Jan 12 20:05:43 2024]
Job 32: Saving experiment metadata
Reason: Missing output files: /data/experiments/spoken-signed/spoken_to_signed_bpe7/config.yml

[Fri Jan 12 20:05:43 2024]
Job 10: Downloading parallel corpus
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/train.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/0ce56c108d92fee96fb735e191df86b6

[Fri Jan 12 20:05:43 2024]
Job 34: Downloading parallel corpus
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/eval/custom-corpus_/corpora/parallel/test/all.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/eval/custom-corpus_/corpora/parallel/test/all.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/0ce56c108d92fee96fb735e191df86b6

[Fri Jan 12 20:05:43 2024]
Job 26: Downloading parallel corpus
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset/custom-corpus_/corpora/parallel/test/all.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset/custom-corpus_/corpora/parallel/test/all.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/0ce56c108d92fee96fb735e191df86b6

[Fri Jan 12 20:05:43 2024]
Job 12: Downloading parallel corpus
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/dev.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/dev.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/0ce56c108d92fee96fb735e191df86b6

[Fri Jan 12 20:05:43 2024]
Job 41: Downloading monolingual dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/mono/custom-mono_/corpora/mono/signs/mono.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba

[Fri Jan 12 20:05:43 2024]
Job 18: Downloading monolingual dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/mono/custom-mono_/corpora/mono/words/mono.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba

[Fri Jan 12 20:05:43 2024]
Job 14: Downloading parallel corpus
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/more/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/more/train.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/0ce56c108d92fee96fb735e191df86b6
[Fri Jan 12 20:05:43 2024]
Finished job 26.
1 of 44 steps (2%) done
Select jobs to execute...
[Fri Jan 12 20:05:43 2024]
Finished job 34.
2 of 44 steps (5%) done
[Fri Jan 12 20:05:43 2024]
Finished job 14.
3 of 44 steps (7%) done
[Fri Jan 12 20:05:43 2024]
Finished job 12.
4 of 44 steps (9%) done
[Fri Jan 12 20:05:43 2024]
Finished job 10.
5 of 44 steps (11%) done
[Fri Jan 12 20:05:45 2024]
Finished job 18.
6 of 44 steps (14%) done
[Fri Jan 12 20:05:46 2024]
Finished job 32.
7 of 44 steps (16%) done
[Fri Jan 12 20:05:47 2024]
Finished job 41.
8 of 44 steps (18%) done

[Fri Jan 12 20:05:47 2024]
Job 11: Cleaning dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/dev.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/dev.spoken.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/dev.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/dev.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/7eca23afcff982010cb31437f76da59b
[Fri Jan 12 20:05:48 2024]
Finished job 11.
9 of 44 steps (20%) done
Select jobs to execute...

[Fri Jan 12 20:05:48 2024]
Job 13: Cleaning dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/more/train.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/more/train.signed.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/more/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/more/train.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/7eca23afcff982010cb31437f76da59b
[Fri Jan 12 20:05:48 2024]
Finished job 13.
10 of 44 steps (23%) done
Select jobs to execute...

[Fri Jan 12 20:05:48 2024]
Job 9: Cleaning dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/train.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/train.signed.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/corpus/custom-corpus_/corpora/parallel/cleaned/train.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/7eca23afcff982010cb31437f76da59b
[Fri Jan 12 20:05:50 2024]
Finished job 9.
11 of 44 steps (25%) done
Select jobs to execute...

[Fri Jan 12 20:05:50 2024]
Job 25: Merging devsets
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset.signed.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset/custom-corpus_/corpora/parallel/test/all.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/original/devset/custom-corpus_/corpora/parallel/test/all.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:50 2024]
Finished job 25.
12 of 44 steps (27%) done
Select jobs to execute...

[Fri Jan 12 20:05:50 2024]
Job 17: Cleaning monolingual dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono/custom-mono_/corpora/mono/words/mono.spoken.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/mono/custom-mono_/corpora/mono/words/mono.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:50 2024]
Finished job 17.
13 of 44 steps (30%) done
Select jobs to execute...

[Fri Jan 12 20:05:50 2024]
Job 16: Merging clean monolingual datasets
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono.spoken.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono/custom-mono_/corpora/mono/words/mono.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:51 2024]
Finished job 16.
14 of 44 steps (32%) done
Select jobs to execute...

[Fri Jan 12 20:05:51 2024]
Job 8: Merging clean parallel datasets
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.spoken.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/dev.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/more/train.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/train.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/cleaned/dev.spoken.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus/custom-corpus_/corpora/parallel/more/train.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:53 2024]
Finished job 8.
15 of 44 steps (34%) done
Select jobs to execute...

[Fri Jan 12 20:05:53 2024]
Job 40: Cleaning monolingual dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono/custom-mono_/corpora/mono/signs/mono.signed.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/original/mono/custom-mono_/corpora/mono/signs/mono.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:54 2024]
Finished job 40.
16 of 44 steps (36%) done
Select jobs to execute...

[Fri Jan 12 20:05:54 2024]
Job 39: Merging clean monolingual datasets
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono.signed.gz; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono/custom-mono_/corpora/mono/signs/mono.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:56 2024]
Finished job 39.
17 of 44 steps (39%) done
Select jobs to execute...

[Fri Jan 12 20:05:56 2024]
Job 22: Splitting monolingual src dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono_src; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono.spoken.gz
Downstream jobs will be updated after completion.

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba

[Fri Jan 12 20:05:56 2024]
Job 20: Splitting the corpus to translate
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/translated/corpus; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.spoken.gz
Downstream jobs will be updated after completion.

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba

[Fri Jan 12 20:05:56 2024]
Job 38: Splitting monolingual trg dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono_trg; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono.signed.gz
Downstream jobs will be updated after completion.

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba

[Fri Jan 12 20:05:56 2024]
Job 27: Training spm vocab
Reason: Missing output files: /data/models/spoken-signed/spoken_to_signed_bpe7/vocab/vocab.spm; Input files updated by another job: /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.signed.gz, /data/data/spoken-signed/spoken_to_signed_bpe7/clean/corpus.spoken.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:57 2024]
Finished job 22.
18 of 44 steps (41%) done
Updating job collect_mono_src.
Select jobs to execute...

[Fri Jan 12 20:05:57 2024]
Job 21: Collecting translated mono src dataset
Reason: Missing output files: /data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono.signed.gz

Activating conda environment: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
[Fri Jan 12 20:05:58 2024]
Error in rule collect_mono_src:
    jobid: 21
    output: /data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono.signed.gz
    log: /data/logs/spoken-signed/spoken_to_signed_bpe7/collect_mono_src.log (check log file(s) for error message)
    conda-env: /firefox-translations-training/.snakemake/conda/3cc0ff2979b25f47047895f30c18abba
    shell:
        bash pipeline/translate/collect.sh "/data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono_src" "/data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono.signed.gz" "/data/data/spoken-signed/spoken_to_signed_bpe7/clean/mono.spoken.gz" >> /data/logs/spoken-signed/spoken_to_signed_bpe7/collect_mono_src.log 2>&1
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job collect_mono_src since they might be corrupted:
/data/data/spoken-signed/spoken_to_signed_bpe7/translated/mono.signed.gz
[Fri Jan 12 20:06:00 2024]
Finished job 38.
19 of 44 steps (43%) done
Updating job collect_mono_trg.
AmitMY commented 9 months ago

@gregtatum i'd love if someone could assist or direct me on this issue.

marco-c commented 9 months ago

@AmitMY unfortunately we are not maintaining Snakemake anymore, we can accept PRs to fix it, but we are focusing on the Taskcluster pipeline.

My suggestion would be to git bisect to find out what caused this problem, and then it should be relatively easy to apply to the Snakemake pipeline whatever change we did on the Taskcluster side.