Closed Marigold closed 8 months ago
@Marigold would the next step on this be to try and move images to R2 instead of having them in DO spaces?
See for example this failing build on staging: https://buildkite.com/our-world-in-data/grapher-automated-staging-environment/builds/2743#018c39a2-5e92-47dd-bf56-c4b0fb152144
Had another failure here, and Saloni got surprised that the build was taking much longer than usual due to the retry needed: https://buildkite.com/our-world-in-data/owid-deploy-content-master/builds/1650#018c683b-2b4b-439f-b4fd-c79dc2aecf39
I put a DigitalOcean uptime check on a single image to see if it catches the failures: https://cloud.digitalocean.com/monitors/uptime/checks/e0e8895d-0fc0-40e7-86be-a3171a70c1e7
This has been happening more often lately. I'm assigning myself and will try to look into it if I have spare time this cycle.
We fixed this with proper retry mechanism.
Our content builds have started failing recently with 404s when fetching images from DigitalOcean Spaces. It happens about once per day now. Here's an example
The image is there and it's not a race condition as it has been modified a month ago. So I guess it's just reliability of DigitalOcean's Spaces. We already use retries, but it's still failing nonetheless.
We could either add more retries to the existing retry mechanism or add retries on the buildkite level (or move those images to R2 which seems to be more stable).