unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

DOJ IG redesigned its site so the scraper is broken #22

Closed LindsayYoung closed 10 years ago

LindsayYoung commented 10 years ago

The good news is that the site is much more uniform and easy to scrape. The bad news is that the current scraper doesn't work on it.

I would like to fix this. We will see when I have time.

konklone commented 10 years ago

In the meantime, I've removed doj from the safe-list in safe.yml in https://github.com/unitedstates/inspectors-general/commit/1966bfde4feaa6cfdd3b3eb633ba62dd61c55cbb. So, you can do a git pull on the production server if you want to stop the warning emails rolling in. The production server is configured to run ./igs --safe, which prevents the production server from running scrapers the project has deemed unstable or incomplete.

If you don't have time to get to this, let me know, and I will be happy to take a stab at it. The parser is pretty intricate (in totally reasonable proportion to how awful the DOJ IG's HTML is/was) so I suspect I'd be slower at it, but it's a very broad, high-impact IG and I'd happily fix it to get it back up and going.