unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Final scrapers: get 'em while you can #140

Closed konklone closed 10 years ago

konklone commented 10 years ago

To anyone watching the project who still wants to scrape an IG: you have a "@spulec minute" to do them, as there are only 8 left. (A checked box means done.)

There will still be plenty of work to do after these are done: we'll want to ensure data quality, and collect some more detailed metrics for each IG. As oversight.io progresses, it'll start shining more light on what we actually have here.

But that will be the end of easy, discrete volunteer scraping opportunities! So ring in on this thread post-haste if you want to do one of the 8 above.

parkr commented 10 years ago

Many IG's have multiple different types of reports. Should they be read in indiscriminately or filtered based on certain criteria?

konklone commented 10 years ago

@parkr We started keeping a type field early on, but have mostly let that go to seed, because it's so chaotic.

We want to get their work product (not necessarily press releases). That includes audits, semiannual reports, peer reviews, congressional testimony, other/special reports, and anything else that looks relevant (though you might ask first before spending work on a category not listed there).

parkr commented 10 years ago

Ok :smile:

I'm working on CPB right now. What should the report ID be? No GUID-looking thing over there.

parkr commented 10 years ago

Corporation for Public Broadcasting (CPB): https://github.com/unitedstates/inspectors-general/pull/144

spulec commented 10 years ago

Starting on FCA

parkr commented 10 years ago

Starting on House of Representatives

konklone commented 10 years ago

I'm claiming the OIG for the Denali Commission:

The Denali Commission is an independent federal agency with its office in Anchorage, Alaska. Congress created it in 1998 through the Denali Commission Act (P.L. 105-277, 42 U.S.C. § 3121). The agency serves as a national “experimental field station” that explores different possibilities for providing basic facilities in remote Alaskan settlements (clinics, powerhouses, fuel tanks, central places to wash clothes and take a shower).

...

And the bathroom continues to be a bucket for many residents of “bush” Alaska (outhouses on the tundra often aren’t feasible). From a broader international perspective, the public health conditions of the developing third world are still a reality up here. Denali serves places where the electricity is sometimes, the water is undrinkable, the fuel tanks leak, the food rots, the garbage sits, the teeth fall out, a shower is a treat, and people get diseases that we assumed were history.

From an informatics standpoint, the Denali OIG's output is bleak. All the reports are full non-OCRed images -- in fact, all of their URLs have the word /Image/ in them. Not a single report has a specific date available for scraping, and the PDF metadata indicates that the report was produced at the dawn of computer time.

dawn

I'm pretty sure the IG office is just one guy -- Mike Marsh -- as the inspections are written in first person.

The IG's performance reports speak of a shrinking budget and a staff that doesn't know what's going to happen to it. The reports include interstitial headers like:

bargain

In its most recent semiannual report to Congress, the IG continues its quest to convince Congress to allow the Denali Commission to die.

case-against

Mike's crusade is a strange and lonely one.

mike-marsh

I'm proud to tackle scraping the reports of the OIG for the Denali Commission, and to honor the tireless work of civil servant Mike Marsh.

spulec commented 10 years ago

I'm going to start on the Smithsonian Institution.

spulec commented 10 years ago

I'm going to start on CFTC.

parkr commented 10 years ago

I'm starting CNCS now.

konklone commented 10 years ago

Aaaaaand we're done! Phase 1 complete.