unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

Fix non-unique report IDs #160

Closed konklone closed 9 years ago

konklone commented 9 years ago

@divergentdave analyzed where we're falling down on unique (within an OIG) report IDs in https://github.com/unitedstates/inspectors-general/pull/151. I'm closing that PR and opening this issue to organize the actionables.

konklone commented 9 years ago

@divergentdave, I just did a full re-archive, from scratch, on my production server. (I moved the existing data/ dir elsewhere, and had it fill a new one.)

At the end, the duplicate ID script showed a bunch of hhs errors, as I know you anticipated, since that's the one left unaddressed.

There were also some non-hhs errors from arc, doc, and usps:

[arc] Duplicate report_id: report14-21-sc-17044 has been used twice this session
[arc] Duplicate report_id: report14-18-administrativereview has been used twice this session
[dod] Duplicate report_id: 99-182 has been used twice this session
[dod] Duplicate report_id: 98-193 has been used twice this session
[dod] Duplicate report_id: 93-159 has been used twice this session
[dod] Duplicate report_id: 92-098 has been used twice this session
[usps] Duplicate report_id: da-ar-12-003 has been used twice this session
[usps] Duplicate report_id: dr-ma-12-003 has been used twice this session
[usps] Duplicate report_id: nl-ar-12-011 has been used twice this session
[usps] Duplicate report_id: nl-ar-12-010 has been used twice this session
[usps] Duplicate report_id: no-ar-12-010 has been used twice this session
[usps] Duplicate report_id: da-ar-12-002 has been used twice this session
[usps] Duplicate report_id: ff-ma-11-016 has been used twice this session
[usps] Duplicate report_id: nl-ar-10-010 has been used twice this session
[usps] Duplicate report_id: dr-ar-10-006 has been used twice this session
[usps] Duplicate report_id: no-ar-09-007 has been used twice this session
[usps] Duplicate report_id: political_campaign_mailings has been used twice this session
[usps] Duplicate report_id: da-ar-08-007 has been used twice this session
[usps] Duplicate report_id: dr-ma-07-005 has been used twice this session
[usps] Duplicate report_id: dr-ar-06-008 has been used twice this session
[usps] Duplicate report_id: nl-ar-06-006 has been used twice this session
[usps] Duplicate report_id: sa-ar-06-001 has been used twice this session
[usps] Duplicate report_id: ms-ma-06-001 has been used twice this session
[usps] Duplicate report_id: hm-ot-05-001_0 has been used twice this session
[usps] Duplicate report_id: nl-ar-05-006 has been used twice this session
[usps] Duplicate report_id: no-ar-04-003 has been used twice this session
[usps] Duplicate report_id: oe-ar-03-003 has been used twice this session
[usps] Duplicate report_id: ac-ar-03-003 has been used twice this session
[usps] Duplicate report_id: mk-ar-01-001 has been used twice this session
[usps] Duplicate report_id: lb-ar-00-003_0 has been used twice this session
[usps] Duplicate report_id: ft-ar-00-001 has been used twice this session
[usps] Duplicate report_id: lm-ma-99-010 has been used twice this session
[usps] Duplicate report_id: ov-ar-99-001 has been used twice this session
[usps] Duplicate report_id: ca-ma-98-003 has been used twice this session
konklone commented 9 years ago

I think this has been taken care of more than enough by @divergentdave, and future issues can be handled piece by piece.