unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

Unique report IDs - Part 2 #176

Closed divergentdave closed 9 years ago

divergentdave commented 9 years ago

This branch is a WIP for correcting cases where the same report_id is used for two different reports that fall in the same year, and thus can't be caught by the QA scripts. The first two commits add validation, everything else is tweaks to scrapers. I've taken some of the easy ones already, but there is still plenty to do.

Changes to scrapers tend to be

Checklist of scrapers

Here's a copy of the list of duplicates I'm working from https://gist.github.com/divergentdave/d520271903ebf8f02776

konklone commented 9 years ago

This is fantastic, @divergentdave. Ping the thread whenever you think it's merge-ready.

konklone commented 9 years ago

Is this stuff worth merging in as is? I'll take all the fixes so far!

divergentdave commented 9 years ago

The two caveats I have right now are that this will start spewing warnings for the six remaining scrapers, and I want to go back and add some comments in the "remarks for IG webmaster" section. Otherwise, it should be ready.

konklone commented 9 years ago

I'm going to accept the warnings, in the interest of more working unique report IDs. I'll leave the branch and won't delete it, so you can refile a new pull request from the same branch if you continue your work here.