open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

uruguay_releases: Too many 500 errors #1048

Closed jpmckinney closed 8 months ago

jpmckinney commented 9 months ago

88k errors on most recent collection: https://open-contracting-partnership.sentry.io/issues/2823958328/?project=6059006&query=is%3Aunresolved&referrer=issue-stream&stream_index=0

There's a local download_delay of 1 and a global CONCURRENT_REQUESTS_PER_DOMAIN = 2.

Not sure how to correct.

A more intensive option is to add a new feature, that checks the rate of 500 errors and cancels the crawl if too high. This should also send a new type of message to Kingfisher Process, to cancel processing.

yolile commented 9 months ago

The 500 response is a java.lang.NullPointerException so it seems something is indeed broken on their side (starting this year?) e.g https://comprasestatales.gub.uy/ocds/release/ajuste_adjudicacion-10050

@fppenna could you report this to them/Mariana Lopez? Thanks!

fppenna commented 9 months ago

Done. The partner says they've fixed it. It seems it works now, at least in the example YL has shared. Could you please check if the error persists?

jpmckinney commented 9 months ago

To check, you can schedule a crawl on the data support server.

Regarding my other idea:

A more intensive option is to add a new feature, that checks the rate of 500 errors and cancels the crawl if too high. This should also send a new type of message to Kingfisher Process, to cancel processing.

I think #531 is the way to go, along with https://github.com/open-contracting/data-registry/issues/29

yolile commented 8 months ago

Hmm, the Sentry URL is no longer pointing to the Uruguay issue, but I started a crawl (using from_date=2024-01) (and another one using from_date=2021-03 until_date=2021-03) and there are no errors so far. I think we can close this issue now in favor of the related issues @jpmckinney mentioned.

jpmckinney commented 8 months ago

For future reference, you can read the crawl logs for Uruguay at /home/collect/scrapyd/logs to see the 500 errors.