owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.37k stars 230 forks source link

Occasional 404s when fetching GDoc Images #2953

Closed Marigold closed 6 months ago

Marigold commented 10 months ago

Our content builds have started failing recently with 404s when fetching images from DigitalOcean Spaces. It happens about once per day now. Here's an example


BakeAll [] 18/20 686.4s ✅ baked google doc posts | 58s
-- | --
  | buildLocalBake.js [baseUrl] [dir]
  |  
  | Bake the site to a local folder
  |  
  | Positionals:
  | baseUrl  Base URL of the site     [string] [default: "http://localhost:3000/"]
  | dir      Directory to save the baked site      [string] [default: "localBake"]
  |  
  | Options:
  | --version  Show version number                                   [boolean]
  | -h, --help     Show help                                             [boolean]
  | --steps    Steps to perform during the baking process
  | [array] [choices: "assets", "blogIndex", "embeds", "googleScholar",
  | "redirects", "rss", "wordpressPosts", "specialPages", "countries",
  | "countryProfiles", "explorers", "charts", "gdocPosts", "gdriveImages", "dods",
  | "removeDeletedPosts"]
  |  
  | Error: Fetching image failed: 404 Not Found https://owid-image-upload.nyc3.cdn.digitaloceanspaces.com/production/FEATURED-IMAGE-Chance-of-having-a-nuclear-war-if-the-annual-probability-is-X.png
  | at concurrency (/home/owid/owid-grapher/baker/GDriveImagesBaker.tsx:67:23)
  | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  | at async /home/owid/owid-grapher/node_modules/p-map/index.js:57:22
  | 🚨 Error: The command exited with status 1

The image is there and it's not a race condition as it has been modified a month ago. So I guess it's just reliability of DigitalOcean's Spaces. We already use retries, but it's still failing nonetheless.

We could either add more retries to the existing retry mechanism or add retries on the buildkite level (or move those images to R2 which seems to be more stable).

danyx23 commented 10 months ago

@Marigold would the next step on this be to try and move images to R2 instead of having them in DO spaces?

danyx23 commented 10 months ago

See for example this failing build on staging: https://buildkite.com/our-world-in-data/grapher-automated-staging-environment/builds/2743#018c39a2-5e92-47dd-bf56-c4b0fb152144

larsyencken commented 9 months ago

Had another failure here, and Saloni got surprised that the build was taking much longer than usual due to the retry needed: https://buildkite.com/our-world-in-data/owid-deploy-content-master/builds/1650#018c683b-2b4b-439f-b4fd-c79dc2aecf39

larsyencken commented 9 months ago

I put a DigitalOcean uptime check on a single image to see if it catches the failures: https://cloud.digitalocean.com/monitors/uptime/checks/e0e8895d-0fc0-40e7-86be-a3171a70c1e7

Marigold commented 8 months ago

This has been happening more often lately. I'm assigning myself and will try to look into it if I have spare time this cycle.

Marigold commented 6 months ago

We fixed this with proper retry mechanism.