unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

Adding Office of Special Counsel #236

Closed lukerosiak closed 8 years ago

lukerosiak commented 8 years ago

The Office of Special Counsel investigates whistleblower retaliation and prohibited personnel practices. Like GAO, its investigative reports are interesting and may focus on any federal agency. I am working on a pull request to incorporate OSC reports.

Here is the list of this year's reports: https://osc.gov/Pages/PublicFiles-FY2015.aspx

1) Are you interested in adding these, and is inspectors-general the appropriate place for it? I believe GAO reports are grouped together with the IGs in Scout as "investigative reports," but I notice that the scraper for GAO reports (not GAO's internal IG) is not housed here.

2) If so, because GAO isn't here, I haven't seen an example of the best way to turn agency-name into agency (slug). Is the goal to map agency name to the slug used in inspectors-general for that agency's IG? Is there a dictionary anywhere of name->slug I should use? OSC uses inconsistent naming formats and sometimes includes sub-agencies (occasionally only using sub-agency name) and once in a while two departments. I suppose I can manually map existing reports by hand, but I may need to do some fuzzy-string matching (maybe jellyfish) to make a best guess for future reports. (I'm assuming agency slug is important, so this is worth it as opposed to just passing the 'agency name' OSC uses verbatim and leaving the slug blank or something.)

3) OSC reports are a little unusual in that there are separate PDFs for a letter to the president, analysis, agency comments and whistleblower comments. It looks like there's no way to pass multiple PDFs for one case-file, so the way to do this is to have each component be its own record, but I thought I would mention this just in case I'm wrong about that.

Let me know and I will proceed with finishing my scraper.

konklone commented 8 years ago

1) Are you interested in adding these, and is inspectors-general the appropriate place for it? I believe GAO reports are grouped together with the IGs in Scout as "investigative reports," but I notice that the scraper for GAO reports (not GAO's internal IG) is not housed here.

Yes, definitely. I'd like to get the GAO report scraper Sunlight uses ported to Python and in here too. While GAO and OSC are slightly out of place for the focus of the project, it's only slightly, and it's much easier to integrate them here rather than create an identical infrastructure.

2) If so, because GAO isn't here, I haven't seen an example of the best way to turn agency-name into agency (slug). Is the goal to map agency name to the slug used in inspectors-general for that agency's IG?

We just picked slugs that seemed to make sense, we weren't pulling from any official mapping. So osc would work fine.

3) OSC reports are a little unusual in that there are separate PDFs for a letter to the president, analysis, agency comments and whistleblower comments. It looks like there's no way to pass multiple PDFs for one case-file, so the way to do this is to have each component be its own record, but I thought I would mention this just in case I'm wrong about that.

This comes up for some IG reports as well. Either way is fine -- single record with extra metadata that links to the related PDFs, or multiple records with one for each PDF -- but only the latter way (the way you suggested) will get the full text of the report search-indexed in the projects that currently use this dataset that I'm aware of.

Let me know and I will proceed with finishing my scraper.

Please do! I just ask that it use a similar format as the others, uses the same utils helper methods, etc., so that it integrates cleanly and consuming software can just git pull and start ingesting OSC reports right away.

In any case, this is really awesome and thank you for being up for contributing!

lukerosiak commented 8 years ago

Cool. Question about slug is for the agency that is the subject of the report, since OSC reports could cover any agency. (I.e. in the required fields, there is 'inspector' slug and 'agency' slug.) The inspector's slug, naturally, will always be osc.