unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Audit for non-OCRed reports #142

Closed konklone closed 10 years ago

konklone commented 10 years ago

Every once in a while, I've seen a fully non-OCRed report, but it's more common to see reports where some pages are not OCRed (even when most are).

NEA's OIG has at least some fully non-OCRed reports.

This PBGC peer review, identified by @spulec, is an example of that. The first page is absolutely crucial to understanding the harshly critical context of the report:

wow

But this cover sheet for the report is not OCRed.

I am not sure if there is any effective automated methodology to detect this sort of thing. But it should at least be noted, maybe on the wiki, so we can add tesseract to our workflow and know when to turn it on.

konklone commented 10 years ago

Closing in favor of https://github.com/unitedstates/inspectors-general/wiki/Need-OCRing