unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
106 stars 21 forks source link

DOD redacted logic too trusting #205

Closed spulec closed 9 years ago

spulec commented 9 years ago

Some DOD reports that say they are redacted actually have links to the reports.

See DODIG-2014-123 on http://www.dodig.mil/pubs/index.cfm?fy=2014. I believe these are reports that were originally redacted, but later released.

konklone commented 9 years ago

Does the scraper skip over these entirely? I can't see the logic that would, though it's a tricky scraper.

I suspect this report was originally unreleased, and then when it was released, the link was added and the word "(Redacted)" added.

If that's the case, and the report's title/ID was released in 2014 but its redacted text released in 2015, we would not catch it in the usual course of nightly events, since it only looks at the latest year. Maybe that's an assumption worth revisiting.

Running ./inspectors/dod.py --year=2014 picks up this report and doesn't mark it as unreleased, so I think this is instead an issue of whether we should be fetching information from farther back in time than we do, on a regular basis. I'm opening a new issue for that.

cc @LindsayYoung

spulec commented 9 years ago

Oops, you are correct. I just saw that it wasn't on oversight.io and forgot that we don't scrape historical regularly.