mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

finetune-student died when publication failed #605

Closed bhearsum closed 5 months ago

bhearsum commented 5 months ago

From https://firefox-ci-tc.services.mozilla.com/tasks/Gd-MzP1YSTeDLva4UyLAxA/runs/0/logs/live/public/logs/live.log:

[task 2024-05-16T19:35:55.564Z] [tracking ERROR] Publication failed: finetune-student-ru-en
[task 2024-05-16T19:36:06.420Z] Traceback (most recent call last):
[task 2024-05-16T19:36:06.420Z]   File "/home/ubuntu/tasks/task_171588804202929/./checkouts/vcs/taskcluster/scripts/pipeline/train_taskcluster.py", line 126, in <module>
[task 2024-05-16T19:36:06.420Z]     main(sys.argv[1:])
[task 2024-05-16T19:36:06.420Z]   File "/home/ubuntu/tasks/task_171588804202929/./checkouts/vcs/taskcluster/scripts/pipeline/train_taskcluster.py", line 122, in main
[task 2024-05-16T19:36:06.420Z]     subprocess.run([TRAINING_SCRIPT, *script_args], check=True)
[task 2024-05-16T19:36:06.420Z]   File "/usr/lib/python3.10/subprocess.py", line 526, in run
[task 2024-05-16T19:36:06.420Z]     raise CalledProcessError(retcode, process.args,
[task 2024-05-16T19:36:06.420Z] subprocess.CalledProcessError: Command '['/home/ubuntu/tasks/task_171588804202929/./checkouts/vcs/taskcluster/scripts/pipeline/train-taskcluster.sh', 'student', 'finetune', 'ru', 'en', '/home/ubuntu/tasks/task_171588804202929/fetches/corpus', '/home/ubuntu/tasks/task_171588804202929/fetches/devset', '/home/ubuntu/tasks/task_171588804202929/artifacts', 'chrf', '/home/ubuntu/tasks/task_171588804202929/fetches/corpus.aln.zst', '0', 'None', 'None', 'None', '--pretrained-model', '/home/ubuntu/tasks/task_171588804202929/fetches/final.model.npz.best-chrf.npz', '--disp-freq', '1', '--save-freq', '25', '--valid-freq', '50', '--after', '50u', '--dim-vocabs', '1000', '1000']' returned non-zero exit status 243.

This was a task fired when I updated https://github.com/mozilla/firefox-translations-training/pull/580, fwiw.

bhearsum commented 5 months ago

Possible regression from https://github.com/mozilla/firefox-translations-training/pull/589 ?

bhearsum commented 5 months ago

Maybe https://github.com/mozilla/firefox-translations-training/pull/589/files#diff-8830c064f55d681da0fb5a1da56155d31a4c1f920ba5ee17905c68125159c2f2R95 not handling a task not prefixed with train properly?

I'm also unsure if TRAIN_LABEL_REGEX handles finetune-student well? https://github.com/mozilla/firefox-translations-training/pull/589/files#diff-dd958c727a1b34ea69637b916064238bf609165948769494dad5a4c3105f0409R15

vrigal commented 5 months ago

I'm also unsure if TRAIN_LABEL_REGEX handles finetune-student well? https://github.com/mozilla/firefox-translations-training/pull/589/files#diff-dd958c727a1b34ea69637b916064238bf609165948769494dad5a4c3105f0409R15

It clearly does not supports finetune-student (but rather finetuned-student). I published #609 with a patch.