Closed LindsayYoung closed 10 years ago
In the meantime, I've removed doj
from the safe-list in safe.yml
in https://github.com/unitedstates/inspectors-general/commit/1966bfde4feaa6cfdd3b3eb633ba62dd61c55cbb. So, you can do a git pull
on the production server if you want to stop the warning emails rolling in. The production server is configured to run ./igs --safe
, which prevents the production server from running scrapers the project has deemed unstable or incomplete.
If you don't have time to get to this, let me know, and I will be happy to take a stab at it. The parser is pretty intricate (in totally reasonable proportion to how awful the DOJ IG's HTML is/was) so I suspect I'd be slower at it, but it's a very broad, high-impact IG and I'd happily fix it to get it back up and going.
The good news is that the site is much more uniform and easy to scrape. The bad news is that the current scraper doesn't work on it.
I would like to fix this. We will see when I have time.