Closed konklone closed 10 years ago
Do you mind if I take the hhs scraper off the safe list in the mean time?
Actually, its fine. I just commented out the scraper in safe.yml on the server.
Thanks again to @spulec for his great work!
@LindsayYoung I was wrong about them causing the scraper to stop -- I thought this because the output was at the bottom of my scraper logs, but that was because they're using print()
and not logging
. I switched a bunch of other print()
calls in https://github.com/unitedstates/inspectors-general/commit/782a799867c23158f7343f064ab263da3114cb26 but left the 404 one, because it actually is convenient to see all the 404s at the bottom of the logs.
But these do not cause the scraper to hang, or to email the admin, so I think they are still safe for safe.yml
.
This morning, I ran the new scrapers locally and hhs.py errored out for me. I can give it another look.
Thanks again!
Ah, hhs
is crashing because of a 404 during the scraping process (one of the landing pages, not downloading a report), so that's a context-specific error that it should choke on. Worth commenting it out of safe.yml
, but not an issue with handling 404s gracefully. (So, closing this specific issue.)
Apparently they cause the scraper to just stop?
A note to re-run the HHS scraper for its archive after this is fixed, too. (And to contact HHS OIG about the 404.)