unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Scrapers now support --archive, to go back to their earliest year #164

Closed konklone closed 10 years ago

konklone commented 10 years ago

inspector.year_range now takes options and an archive year, so any scraper that passes a year to that function will support an --archive flag.

I've updated each scraper to replace its top-of-file comment with an actual variable declaration (e.g. archive = 2006), and each call to inspector.year_range to pass this variable in.

The --archive flag will work in either form of running a scraper:

./igs --archive
./inspectors/usps.py --archive

It's now possible to easily re-run the scrapers over their entire archive, and keep up with improvements to scraping and metadata that affect the entire set.

konklone commented 10 years ago

I'm running this once on my production server before merging, and will merge if there are no errors along the way.

audiodude commented 10 years ago

It seems like either the parameter, archive, should be renamed to something self explanatory (like oldest_report_year) or the comments saying "# Oldest report year" should be retained. As it is, archive = 2002 at the top of a scraper doesn't have much inherent meaning. To me, archive is a verb and sounds, at best, like a boolean parameter controlling whether results should be archived.

konklone commented 10 years ago

Probably oldest would be the better term for the variable, I just wanted to keep an obvious link between --archive and the variable. I don't feel strongly enough about it to go and change them all again myself, though (it was pretty tedious).

konklone commented 10 years ago

I've fixed several errors I saw pop up during the re-archive, and after fixing them I think this is good for merging.

audiodude commented 10 years ago

Great work! :+1: