unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Unique report IDs - Part 2 #176

Closed divergentdave closed 10 years ago

divergentdave commented 10 years ago

This branch is a WIP for correcting cases where the same report_id is used for two different reports that fall in the same year, and thus can't be caught by the QA scripts. The first two commits add validation, everything else is tweaks to scrapers. I've taken some of the easy ones already, but there is still plenty to do.

Changes to scrapers tend to be

Checklist of scrapers

Here's a copy of the list of duplicates I'm working from https://gist.github.com/divergentdave/d520271903ebf8f02776

konklone commented 10 years ago

This is fantastic, @divergentdave. Ping the thread whenever you think it's merge-ready.

konklone commented 10 years ago

Is this stuff worth merging in as is? I'll take all the fixes so far!

divergentdave commented 10 years ago

The two caveats I have right now are that this will start spewing warnings for the six remaining scrapers, and I want to go back and add some comments in the "remarks for IG webmaster" section. Otherwise, it should be ready.

konklone commented 10 years ago

I'm going to accept the warnings, in the interest of more working unique report IDs. I'll leave the branch and won't delete it, so you can refile a new pull request from the same branch if you continue your work here.