unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add National Endowment for the Arts #130

Closed spulec closed 10 years ago

spulec commented 10 years ago

Adds the OIG for the National Endowment for the Arts.

konklone commented 10 years ago

One thing this IG does that I haven't seen before are FISMA reports on an agency's cybersecurity. Their language on the page implies that all the IGs do this, but I hadn't noticed it yet. These would be nice to get.

This agency also publishes their peer reviews, which would also be nice to get.

konklone commented 10 years ago

Merge-ready. Thanks for doing this one, @spulec!

art-studio-kids-longisland-211

konklone commented 10 years ago

Ah, this one crashed during the full archive:

[report][2013-11-01][MTG-Disposition-Closeout]
## Downloading: http://arts.gov/sites/default/files/MTG-Disposition-Closeout.pdf
##  to: data/nea/2013/MTG-Disposition-Closeout/report.pdf
GET - http://arts.gov/sites/default/files/MTG-Disposition-Closeout.pdf
Resetting dropped connection: arts.gov
"GET /sites/default/files/MTG-Disposition-Closeout.pdf HTTP/1.1" 200 2265515
    report: nea/2013/MTG-Disposition-Closeout/report.pdf
Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 24, in run
    run_method(cli_options)

  File "./inspectors/nea.py", line 34, in run
    inspector.save_report(report)

  File "/home/unitedstates/inspectors-general/inspectors/utils/inspector.py", line 46, in save_report
    metadata = extract_metadata(report)

  File "/home/unitedstates/inspectors-general/inspectors/utils/inspector.py", line 163, in extract_metadata
    metadata = utils.metadata_from_pdf(report_path)

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 200, in metadata_from_pdf
    output = output.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 93: invalid continuation byte
divergentdave commented 10 years ago

Here's what pdfinfo tells me

Creator:        Xerox FreeFlow
Producer:       Xerox FreeFlow Scan and Print
CreationDate:   ≡Åù►êÅ¥►Hjì►≡Åù►êÅ¥►êÅ¥►≡Åù►êÅ¥►≡Åù►Çφ├►≡Åù►êÅ¥►≡Åù►êÅ¥►¿Ω∙♦╣▲àδ
Q╕ê@
ModDate:        01/24/14 15:03:08
Tagged:         no
Pages:          25
Encrypted:      no
Page size:      612 x 792 pts (letter)
File size:      2265515 bytes
Optimized:      yes
PDF version:    1.4

For comparison, Firefox's pdf.js reader says " Creation Date: Invalid Date, Invalid Date," and Adobe Reader doesn't show a creation date.

I recommend we change output.decode('utf-8') to output.decode('utf-8', errors='replace'), I just did the same over on #141.

konklone commented 10 years ago

@divergentdave nailed it, applied your fix in https://github.com/unitedstates/inspectors-general/commit/d1c26f56a7432e68c65f6ae30668fdbbd5ccd662

spulec commented 10 years ago

Peer reviews and FISMA added with d1c452b5f234ad745527981c6f675b0c036fe854

Based on my brief reading of FISMA, it seems that a lot of IGs may just include this in their semiannual reports.