Closed lukerosiak closed 6 years ago
Amazing, thank you! I'll take a look at this in a few days.
Hell yeah! I'll let @divergentdave review and merge, but this is super solid work, thank you.
I pushed some miscellaneous changes throughout for error handling, style, etc. I ran it on a year and things look good, going to run it over the full archive next.
Note to self: need to add this to safe.yml
Okay, this looks good. I saw a light dusting of 404 errors and duplicate report IDs, mostly in older years. I'm going to merge this, do an archive run on the production server, index everything, and then add it to safe.yml
.
The scraping is done, the reports are ingested, and I've added it to safe.yml
going forward. All done here, thanks again @lukerosiak!
Awesome, thank you!
Much belatedly, here is a scraper for GAO reports and restricted reports per #269. It gathers 52,000 reports--90GB--dating back to 1940, though the default year for archiving is set to 1970 here.