unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add HEAD requests against report urls for --dry_run #125

Closed spulec closed 10 years ago

spulec commented 10 years ago

This is what I've been using locally for the past few scrapers.

I don't think check_report_url is a particularly great name.

It's also possible that we should allow other status codes (301, 302, etc). Maybe use res.ok instead?

Closes #108

konklone commented 10 years ago

I think res.ok is probably a synonym for 2XX. So maybe if res.status_code >=200 and res.status_code < 400, to catch non-error states?

This is great, feel free to merge after making those changes, and testing the scraper in at least one condition where the url field is missing.

konklone commented 10 years ago

(I'm going to be busy merging your IG scrapers for a bit.)

spulec commented 10 years ago

It appears that res.ok basically means res.status_code not in range(400, 601). See here.

I think this means we are good to switch to it.

konklone commented 10 years ago

:+1: