Closed eu9ene closed 1 month ago
Hmm, maybe it's because continuation used to be done at graph generation time rather than at run-time? Perhaps we can prune these eval tasks from the graph using its parameters. Alternatively, we could have the eval tasks exit successfully without doing anything if they detect that the model was pretrained.
We should remove any redundant tasks from the graph. We can assume the pre-trained model has already been evaluated.
Hmm, maybe it's because continuation used to be done at graph generation time rather than at run-time? Perhaps we can prune these eval tasks from the graph using its parameters. Alternatively, we could have the eval tasks exit successfully without doing anything if they detect that the model was pretrained.
As far as I can tell, the run that was linked to is not using runtime continuation. I suspect this regressed with one of the recent-ish changes to train.py: https://github.com/mozilla/firefox-translations-training/commits/main/taskcluster/translations_taskgraph/actions/train.py
https://firefox-ci-tc.services.mozilla.com/tasks/CEUR_rZ1Qty22JNz3JC-mw
We shouldn't run evals for the pre-trained models though. It wasn't the case before, so something got broken in training continuation.
This is not critical as it does not block other tasks.