API errors are not uncommon, but they are hard to reproduce. HTTP 500 Internal Server Error and HTTP 503 Service Not Available seem to be the most common.
The crawler should gracefully retry those requests when it can rather than logging them and skipping them. That's especially important because in crawling the pages, if one page fails the crawler currently won't try subsequent pages. That would mean that a big collection might be entirely lost.
Probably there should be better logging of those errors too so I can understand why they are happening.
API errors are not uncommon, but they are hard to reproduce. HTTP 500 Internal Server Error and HTTP 503 Service Not Available seem to be the most common.
The crawler should gracefully retry those requests when it can rather than logging them and skipping them. That's especially important because in crawling the pages, if one page fails the crawler currently won't try subsequent pages. That would mean that a big collection might be entirely lost.
Probably there should be better logging of those errors too so I can understand why they are happening.