Closed lukerosiak closed 8 years ago
I have created a scraper for Office of Special Counsel, an important entity that investigates whistleblower retaliation and other prohibited practices across all federal agencies. The scraper parses 1,137 reports going back to 2009.
This is outstanding. Thank you so much for this contribution! Yes, we will definitely accept this.
Next, if you want it, I plan to contribute a module that will be the most effective way of solving your "incorporating manually FOIA'd reports" problem. It will be a scraper for GovernmentAttic.org, which has over 2,000 FOIA'd reports, mostly IG reports, and is regularly updated with new ones.
I definitely welcome that contribution too, though it will be a bit more complicated. In small part because it's an unofficial source, but in large part because the quality of the documents I've seen there tends to be really poor and will need a lot of OCRing. But it's also a huge trove of super relevant documents (including the names of a ton of unreleased IG reports), so it's definitely worth including here if you're going to write it.
@lukerosiak Mind tweaking your PR to address the build breakage? We use pyflakes
, and it's flagged the os
and urljoin
imports as unused:
https://travis-ci.org/unitedstates/inspectors-general/builds/106132793
Of course--done. Thank you for building such an important resource. I will open an issue about GovernmentAttic.
I have created a scraper for Office of Special Counsel, an important entity that investigates whistleblower retaliation and other prohibited practices across all federal agencies. The scraper parses 1,137 reports going back to 2009.
Next, if you want it, I plan to contribute a module that will be the most effective way of solving your "incorporating manually FOIA'd reports" problem. It will be a scraper for GovernmentAttic.org, which has over 2,000 FOIA'd reports, mostly IG reports, and is regularly updated with new ones.