Closed spulec closed 10 years ago
One thing this IG does that I haven't seen before are FISMA reports on an agency's cybersecurity. Their language on the page implies that all the IGs do this, but I hadn't noticed it yet. These would be nice to get.
This agency also publishes their peer reviews, which would also be nice to get.
Merge-ready. Thanks for doing this one, @spulec!
Ah, this one crashed during the full archive:
[report][2013-11-01][MTG-Disposition-Closeout]
## Downloading: http://arts.gov/sites/default/files/MTG-Disposition-Closeout.pdf
## to: data/nea/2013/MTG-Disposition-Closeout/report.pdf
GET - http://arts.gov/sites/default/files/MTG-Disposition-Closeout.pdf
Resetting dropped connection: arts.gov
"GET /sites/default/files/MTG-Disposition-Closeout.pdf HTTP/1.1" 200 2265515
report: nea/2013/MTG-Disposition-Closeout/report.pdf
Traceback (most recent call last):
File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 24, in run
run_method(cli_options)
File "./inspectors/nea.py", line 34, in run
inspector.save_report(report)
File "/home/unitedstates/inspectors-general/inspectors/utils/inspector.py", line 46, in save_report
metadata = extract_metadata(report)
File "/home/unitedstates/inspectors-general/inspectors/utils/inspector.py", line 163, in extract_metadata
metadata = utils.metadata_from_pdf(report_path)
File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 200, in metadata_from_pdf
output = output.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 93: invalid continuation byte
Here's what pdfinfo tells me
Creator: Xerox FreeFlow
Producer: Xerox FreeFlow Scan and Print
CreationDate: ≡Åù►êÅ¥►Hjì►≡Åù►êÅ¥►êÅ¥►≡Åù►êÅ¥►≡Åù►Çφ├►≡Åù►êÅ¥►≡Åù►êÅ¥►¿Ω∙♦╣▲àδ
Q╕ê@
ModDate: 01/24/14 15:03:08
Tagged: no
Pages: 25
Encrypted: no
Page size: 612 x 792 pts (letter)
File size: 2265515 bytes
Optimized: yes
PDF version: 1.4
For comparison, Firefox's pdf.js reader says " Creation Date: Invalid Date, Invalid Date," and Adobe Reader doesn't show a creation date.
I recommend we change output.decode('utf-8')
to output.decode('utf-8', errors='replace')
, I just did the same over on #141.
@divergentdave nailed it, applied your fix in https://github.com/unitedstates/inspectors-general/commit/d1c26f56a7432e68c65f6ae30668fdbbd5ccd662
Peer reviews and FISMA added with d1c452b5f234ad745527981c6f675b0c036fe854
Based on my brief reading of FISMA, it seems that a lot of IGs may just include this in their semiannual reports.
Adds the OIG for the National Endowment for the Arts.