Closed gregtatum closed 1 month ago
I'll have a look. Do I understand correctly that we need this working to use preemptible instances? @bhearsum
Yeah, we should make sure continuation works if we're using preemptible instances. The other options are: don't use preemptible instances or change it to not try to continue training (which really isn't a good option...).
It looks like https://github.com/mozilla/firefox-translations-training/blob/b0b5f25d0289a90619a12e645683cfd671332a85/taskcluster/scripts/pipeline/train_taskcluster.py#L35-L38 just needs a bump. Sorry for not catching that in review.
It looks like #881 broke training continuation for students. There is some mis-direction around the train taskcluster script I think needs updating. I don't think we have
run_task
tests for training continuation. I'm guessing it's some of the argument manipulation which I don't understand intaskcluster/scripts/pipeline/train_taskcluster.py
.