Closed eu9ene closed 4 months ago
It almost looks as if run #0 didn't manage to upload all of its artifacts. Although model.npz.optimizer.npz
is listed there, it is most certainly a 404.
We should probably avoid failing on such a 404, or any other issue retrieving artifacts from a previous run. We can always try again from even earlier runs, and even if all of those are missing artifacts (or we fail to fetch them for some reasons), we can always start from scratch.
At the moment, any 404 causes the task to fail with no way to even unhork the graph :(
https://github.com/mozilla/firefox-translations-training/pull/671 should fix this, and applies cleanly to release
.
Note that any fix won't fix the existing task. I think you'll need to start a new train
action with previous_group_ids
and start-stage
to get this moving again.
https://firefox-ci-tc.services.mozilla.com/tasks/EdZxplswTGWoT5WE6er2CA/runs/10/logs/public/logs/live.log