unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

Update state for new website design #180

Closed spulec closed 9 years ago

spulec commented 9 years ago

See #177

The report ids are the same as before. It appears that some more reports have been added and some old ones have been removed. Based on some old data I have, the old state scraper would get ~500 reports, while the new system gets over 1,000. The oldest in the old system was 1994, while the oldest in the new system is 2004. If someone with a more complete dataset could run the new one and do some better analysis, that would be great.

plantfansam commented 9 years ago

This is running for me! :+1:

@spule -- I don't have the entire corpus of scraped reports for state, unfortunately. @konklone, any chance you have them on oversight.io? The thorough solution is probably to download and analyze the corpus from the Internet Archive, which is now possible! I'm glad to do that, but it's a big (34 GB!) download!

spulec commented 9 years ago

@konklone already sent me just the state archives the other day. I'm doing an analysis and working on writing an email to State OIG to notify them of the couple reports that are now missing. I'm hoping to wrap it up in the next couple of days. Feel free to send me an email if you want the state archives.

plantfansam commented 9 years ago

Oh, awesome! If there's anything I can do to help, let me know.

spulec commented 9 years ago

Most of the missing ones were Congressional Testimony that I found available at http://oig.state.gov/testimony-news. I have updated the script to pull those too.

There were three additional reports missing:

1.) 228989 ("Inspection of the Office of Cuba Broadcasting"): this report seems to be changed to report id 228991

2.) 162347 ("Audit of Department of State Controls Over Bureau of Diplomatic Security Domestic Firearms and Optics (AUD/SI-11-25)"): this report is missing

3.) 211870 ("Audit of Department of State Compliance With Physical/Procedural Security Standards at Selected High Threat Level Posts (AUD-SI-13-32)"): this report is missing

I have sent an email to State OIG about the two missing reports.

spulec commented 9 years ago

State OIG responded that they are still in the migration process but the reports should be available within the next couple of days. I will keep an eye on it.

konklone commented 9 years ago

Thanks for doing the legwork, @spulec.

konklone commented 9 years ago

I'd like to merge this so that we can get the new reports from the site, but still keep an issue open for dealing with the missing old reports. I have my old cache of state reports from the old scraper, and I'll back them up in S3, so we can always use them for analysis/restoration later.

spulec commented 9 years ago

I followed up with state to see if they have a timeframe for the rest of the migration.