Closed eu9ene closed 1 week ago
Copying discussion from Slack related to the sudden rebuilding of the toolchains.
@bhearsum:
We have a guard in taskgraph that will not use a cached task if that task expires before the deadline of a task that depends on it. This is to ensure that any tasks upstream of the ones we're about to create will be available at all possible times the tasks could run. In this case, this task is the one we should've re-used, whose expiry is 2024-08-14T11:28:43.692Z. One of the tasks that wanted it ended up being created with a deadline of 2024-08-18T18:26:25.727Z - 4 days after the cached task expires. For the short term, I suggest you reduce the default task deadline such that it will be before 2024-08-14. If you do that and try again with the same training config, I expect you'll pick up the tasks you expect. For the medium term, we should lengthen the expiry time of toolchain, and perhaps other tasks. And it may be a good idea to force rebuilds at the start of big trainings to make sure we don't depend on any cached tasks that may be expiring soon.
Reducing the deadline to 10 days helped.
Closing this since we've landed a workaround but I guess it doesn't solve the problem for some cases. Let's reopen if we see it again in the future.
It seems the issue with the deadline is even more serious. My train action with caching fails now:
https://firefox-ci-tc.services.mozilla.com/tasks/DQkkrHmNQP6dSjg5p_YYfQ/runs/0
I'm running it from this PR push task: https://github.com/mozilla/firefox-translations-training/pull/690 With this config, which is a regular way to use the cached tasks: