mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

CI run is slowed down by failing to download fetches #855

Open gregtatum opened 1 month ago

gregtatum commented 1 month ago

I'm working on some performance optimization to bring down our CI times, and I found some tasks are taking 12 minutes to resolve due to issues downloading fetches and artifacts. This compounds when tasks depend upon each other, so a few tasks failing in this way can increase CI runs by 20-30 minutes. After some fixes that I'm working towards merging in now, this will be the slowest part of the CI pipeline.

And here is a profile of the task:

Here you can see that it spend 12 minutes attempting to download fetches and artifacts.

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SVZMV_7KS3mGqDMrMgj9_A/artifacts/public/build/marian.tar.zst
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 91.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 90.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 89.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 90.00s (attempt 2/5)
attempt 3/5
bhearsum commented 1 month ago

https://github.com/mozilla/firefox-translations-training/issues/549 is related, possibly the same?