ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.05k stars 2.49k forks source link

zig package manager should retry fetching on temporary errors #17472

Open mitchellh opened 11 months ago

mitchellh commented 11 months ago

Zig Version

0.12.0-dev.790+ad6f8e3a5

Steps to Reproduce and Observed Behavior

My project (Ghostty) has ~20 dependencies it has to fetch from the internet given an uncached build. These are fetched from multiple sources (github, gitlab, self-hosted, etc.). Each push results in ~7 separate CI runs which run uncached. As a result, each Git push results in ~140 new HTTP fetches for these large files.

Even with this relatively small number, its big enough that I'm regularly seeing temporary errors fail CI. These are the sorts of errors that an immediate retry fixes the issue 90% of the time, and a very short backoff fixes the issue the remaining 10% of the time. So far, none of the failures have been real persisted outages.

Expected Behavior

Zig should retry on fetch errors.

The ones I've seen but this likely isn't exhaustive:

My recommendation would be to do a short series of backoffs. Even backing off a few times with a maximum total wait time of 10 seconds would probably fix 90% of my issues, but perhaps this can be configurable because for CI I'd happily configure this to upwards of a few minutes.

The backoff should probably increase, whether it be exponential, fibbonacci, whatever.

As a workaround, I can make my whole CI job retry, but the CI job does a lot of other setup and no other steps in the job are flaky so this would probably result in retrying genuine failures and cost me more money. It'd be much more convenient and faster if Zig retried.

andrewrk commented 1 day ago

Still interested in solving this issue, but wanted to suggest an alternative workaround for the time being. Could you try persisting your p directory in the global zig cache across CI invocations? Or maybe even your entire global zig cache directory since you're using the same zig version.

I think it's a nice workaround because it's something valuable to do regardless - will make your CI run faster and use less network resources. For example, mlugg's Setup Zig Action does this by default.

mitchellh commented 1 day ago

Still interested in solving this issue, but wanted to suggest an alternative workaround for the time being. Could you try persisting your p directory in the global zig cache across CI invocations? Or maybe even your entire global zig cache directory since you're using the same zig version.

We are doing this now! This plus I've mirrored all my dependency in a single web host I use has made this a non-issue for me. I think generally its still useful to solve but I have successfully worked around it.