lmullen / cchc

America's Public Bible for Computing Cultural Heritage in the Cloud
Creative Commons Zero v1.0 Universal
8 stars 1 forks source link

Retry on API errors #1

Closed lmullen closed 3 years ago

lmullen commented 3 years ago

API errors are not uncommon, but they are hard to reproduce. HTTP 500 Internal Server Error and HTTP 503 Service Not Available seem to be the most common.

The crawler should gracefully retry those requests when it can rather than logging them and skipping them. That's especially important because in crawling the pages, if one page fails the crawler currently won't try subsequent pages. That would mean that a big collection might be entirely lost.

Probably there should be better logging of those errors too so I can understand why they are happening.