Closed eu9ene closed 1 month ago
It's likely related to the workaround for the deadline issue: https://github.com/mozilla/firefox-translations-training/issues/691
I do not think this is related to the deadline issue - we wouldn't see 404's for tasks that are past their deadline, only for tasks that have expired. It looks to me like this is a new edge case with how we're handling previous_group_ids
. We walk up the graph of any tasks in the given group(s) to find all ancestors. The stacktrace shows us a few levels deep into that when we hit this 404, so we're presumably getting this when trying to fetch the task definition of a transitive dependency of one of the tasks from previous_group_ids
.
The fix here is most likely to ignore 404s when fetching ancestors, which is something I'll need to fix upstream in taskgraph.
Upstream fix is being worked on in https://github.com/taskcluster/taskgraph/pull/569
@bhearsum I see, thanks for the investigation and prioritizing this! I'd like to point out that this issue has been blocking us for two weeks and all the training in the big batch is currently stopped. I didn't want to ping other people since you have the most context on all this. I guess we'll need to figure out how to cherry pick the required fixes in release branch because we did not upgrade taskgraph there yet.
I'll see what I can do about release
. I can probably have something up for that today or tomorrow.
The action worked with the fix: https://firefox-ci-tc.services.mozilla.com/tasks/O7cfmFR_SuaZNg8d8b0EWQ
https://github.com/mozilla/firefox-translations-training/pull/834 fixed this on main by picking up the upstream fix.
I’m trying to restart some trainings that failed with deadline exceeded and they all fail with 404 in logs. Apparently it can’t find some task. For example: https://firefox-ci-tc.services.mozilla.com/tasks/Jyi8Pmf8TZ-ve9Q8JE8RcA/runs/0
It fails for all the languages I'm trying to restart.