Closed eu9ene closed 1 month ago
The only workaround we have for this is adding tasks between merge-corpus
and its upstreams to avoid hitting this limit. We've done this before for the all
tasks, but this will bit a little bit different because we need to pull artifacts from the upstream. It's tractable though.
We've done this before for the
all
tasks, but this will bit a little bit different because we need to pull artifacts from the upstream. It's tractable though.
To this point, I'm just going to republish artifacts in the dummy tasks. I looked the bicleaner tasks on one of the large recent training runs, and the artifacts totaled to ~25GB at rest. That costs ~$.40/month to store in GCP, so even if we had 100 of those size runs in a year we're looking at ~$500/year to store them. We can revisit this decision at some point, but I don't think it's worth fussing with an alternate solution for finding artifacts of an indirect upstream at this time.
Another solution for this could be to chunk dataset tasks together. If we managed to chunk them by size we could avoid increasing the end to end runtime as well. This might not be great for caching purposes, but I wanted to mention it here for completeness.
It turns out that the current limit is likely not a hard limit these days. We're working an removing or greatly increasing this limit in Taskcluster in https://github.com/taskcluster/taskcluster/issues/7151.
The Firefox CI cluster now supports up to 10,000 dependencies :partying_face:. We'll still need a taskgraph change to have that allow for it.
https://firefox-ci-tc.services.mozilla.com/tasks/ZqlokLMTQG-pZPJtK9UnOw/runs/0/logs/public/logs/live.log
Exception: task merge-corpus/merge-corpus-da-en has too many dependencies (105 > 99)