Closed konklone closed 10 years ago
Many IG's have multiple different types of reports. Should they be read in indiscriminately or filtered based on certain criteria?
@parkr We started keeping a type
field early on, but have mostly let that go to seed, because it's so chaotic.
We want to get their work product (not necessarily press releases). That includes audits, semiannual reports, peer reviews, congressional testimony, other/special reports, and anything else that looks relevant (though you might ask first before spending work on a category not listed there).
Ok :smile:
I'm working on CPB right now. What should the report ID be? No GUID-looking thing over there.
Corporation for Public Broadcasting (CPB): https://github.com/unitedstates/inspectors-general/pull/144
Starting on FCA
Starting on House of Representatives
I'm claiming the OIG for the Denali Commission:
The Denali Commission is an independent federal agency with its office in Anchorage, Alaska. Congress created it in 1998 through the Denali Commission Act (P.L. 105-277, 42 U.S.C. § 3121). The agency serves as a national “experimental field station” that explores different possibilities for providing basic facilities in remote Alaskan settlements (clinics, powerhouses, fuel tanks, central places to wash clothes and take a shower).
...
And the bathroom continues to be a bucket for many residents of “bush” Alaska (outhouses on the tundra often aren’t feasible). From a broader international perspective, the public health conditions of the developing third world are still a reality up here. Denali serves places where the electricity is sometimes, the water is undrinkable, the fuel tanks leak, the food rots, the garbage sits, the teeth fall out, a shower is a treat, and people get diseases that we assumed were history.
From an informatics standpoint, the Denali OIG's output is bleak. All the reports are full non-OCRed images -- in fact, all of their URLs have the word /Image/
in them. Not a single report has a specific date available for scraping, and the PDF metadata indicates that the report was produced at the dawn of computer time.
I'm pretty sure the IG office is just one guy -- Mike Marsh -- as the inspections are written in first person.
The IG's performance reports speak of a shrinking budget and a staff that doesn't know what's going to happen to it. The reports include interstitial headers like:
In its most recent semiannual report to Congress, the IG continues its quest to convince Congress to allow the Denali Commission to die.
Mike's crusade is a strange and lonely one.
I'm proud to tackle scraping the reports of the OIG for the Denali Commission, and to honor the tireless work of civil servant Mike Marsh.
I'm going to start on the Smithsonian Institution.
I'm going to start on CFTC.
I'm starting CNCS now.
Aaaaaand we're done! Phase 1 complete.
To anyone watching the project who still wants to scrape an IG: you have a "@spulec minute" to do them, as there are only 8 left. (A checked box means done.)
There will still be plenty of work to do after these are done: we'll want to ensure data quality, and collect some more detailed metrics for each IG. As oversight.io progresses, it'll start shining more light on what we actually have here.
But that will be the end of easy, discrete volunteer scraping opportunities! So ring in on this thread post-haste if you want to do one of the 8 above.