unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

Adding GAO reports (not GAO IG) #302

Closed lukerosiak closed 6 years ago

lukerosiak commented 7 years ago

Much belatedly, here is a scraper for GAO reports and restricted reports per #269. It gathers 52,000 reports--90GB--dating back to 1940, though the default year for archiving is set to 1970 here.

divergentdave commented 7 years ago

Amazing, thank you! I'll take a look at this in a few days.

konklone commented 7 years ago

Hell yeah! I'll let @divergentdave review and merge, but this is super solid work, thank you.

divergentdave commented 6 years ago

I pushed some miscellaneous changes throughout for error handling, style, etc. I ran it on a year and things look good, going to run it over the full archive next.

Note to self: need to add this to safe.yml

divergentdave commented 6 years ago

Okay, this looks good. I saw a light dusting of 404 errors and duplicate report IDs, mostly in older years. I'm going to merge this, do an archive run on the production server, index everything, and then add it to safe.yml.

divergentdave commented 6 years ago

The scraping is done, the reports are ingested, and I've added it to safe.yml going forward. All done here, thanks again @lukerosiak!

lukerosiak commented 6 years ago

Awesome, thank you!