Evaluation fails on a pre-trained backward model

mozilla / translations

The code, training pipeline, and models that power Firefox Translations

https://mozilla.github.io/translations/

Mozilla Public License 2.0

154 stars 33 forks source link

Evaluation fails on a pre-trained backward model #628

Closed eu9ene closed 1 month ago

eu9ene commented 5 months ago

https://firefox-ci-tc.services.mozilla.com/tasks/CEUR_rZ1Qty22JNz3JC-mw

We shouldn't run evals for the pre-trained models though. It wasn't the case before, so something got broken in training continuation.

This is not critical as it does not block other tasks.

gabrielBusta commented 3 months ago

Hmm, maybe it's because continuation used to be done at graph generation time rather than at run-time? Perhaps we can prune these eval tasks from the graph using its parameters. Alternatively, we could have the eval tasks exit successfully without doing anything if they detect that the model was pretrained.

eu9ene commented 3 months ago

We should remove any redundant tasks from the graph. We can assume the pre-trained model has already been evaluated.

bhearsum commented 3 months ago

Hmm, maybe it's because continuation used to be done at graph generation time rather than at run-time? Perhaps we can prune these eval tasks from the graph using its parameters. Alternatively, we could have the eval tasks exit successfully without doing anything if they detect that the model was pretrained.

As far as I can tell, the run that was linked to is not using runtime continuation. I suspect this regressed with one of the recent-ish changes to train.py: https://github.com/mozilla/firefox-translations-training/commits/main/taskcluster/translations_taskgraph/actions/train.py