harvard-lil / perma

Indelible links
424 stars 71 forks source link

Retry WACZ download #3656

Closed rebeccacremona closed 1 week ago

rebeccacremona commented 1 week ago

See LIL-2877.

Since we integrated with the Scoop API in 2023, we have sporadically seen save_scoop_capture fail with a ProtocolError when attempting to download the WARC/WACZ file from the API: either ConnectionResetError, early on, or, more commonly recently, IncompleteRead. Every so often, there is a flurry of errors for a few minutes, and then it resolves. Every so often, there is a standalone error. The incidents happen at different times of the day, and on different days. It might happen several days in a row, and then not again for weeks or months. The incidence picked up sharply in late September 2024.

From reading around, I believe this is due to transient network problems; I have not heard any suggestions for solutions other than "check your internet connection" or "try again."

So, this PR... tries again.

It reuses our standard utility for retries, which retries with an exponential backoff, starting from 100ms delay. I arbitrarily set the number of retries to 3. Though that doesn't end up introducing much of a delay, I think that's okay for a first pass: since the API call itself takes time, there is an additional built-in delay.

I have not found a good way to simulate or reproduce the error locally, so it is merely a hypothesis that this will help.

If we decide to merge and deploy this, follow up would be: watch and see how things go for a few weeks. If we see occasional single failures (like, one every few days) or any longer incidents, we could consider bumping up the number of retries. If we don't see any longer incidents for several weeks, let's say, 2 months, then I would be convinced this mechanism is working, and not that, the problem simply hasn't recurred.

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.62%. Comparing base (d4747b8) to head (6df01a5). Report is 8 commits behind head on develop.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #3656 +/- ## =========================================== + Coverage 69.61% 69.62% +0.01% =========================================== Files 54 54 Lines 7350 7336 -14 =========================================== - Hits 5117 5108 -9 + Misses 2233 2228 -5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.