unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Check for PDF "attachments" with pdftk #2

Open konklone opened 10 years ago

konklone commented 10 years ago

Even if the USPS or DHS IGs don't have them, at least set up a process where if it does detect any, it emails the admin.

divergentdave commented 10 years ago

Rough procedure:

This seems kinda ugly, but there aren't many programs out there that deal with PDF attachments

divergentdave commented 10 years ago

I wrote this up as an out-of-band script in my fork, (https://github.com/divergentdave/inspectors-general/blob/scripts/find_pdf_attachments.py) @konklone could you run this against your full archive when you get a chance?

divergentdave commented 10 years ago

I ran this on my local collection (archive of just the current year) and found a few PDFs with Press Quality.joboptions or Standard.joboptions attached, which are configuration files for/from Acrobat Distiller. Several reports from the State Department OIG include an automatically-generated accessibility report (*.accreport.html) from Adobe Acrobat. Nothing too interesting so far.

divergentdave commented 7 years ago

I reran this with more PDFs on my local machine and got the following interesting results. (excluding qpdf failures, joboptions files, and accessibility report files) There are several important-looking file names, but it's scarce enough that throwing the files inside into unitedstates/reports might be the best move.

data/treasury/2006/aprsep06/report.pdf has the following attachments: Front Cover and HIB - Final (10_30).pdf
data/treasury/2015/OIG-16-014/report.pdf has the following attachments: Re FW CPFS - Balance Sheet Presentation of Net Position.pdf, FW Proposed JV to DOI SCNP.pdf, RE Reclassification of Financial Statements for FY 2013.pdf, Re GFRS - Note 04A Direct Loans Receivable.pdf
data/pbgc/2006/2007-1-FA-0024-1/report.pdf has the following attachments: PBGC Response to OIG CG Financial Statement Audit Reports as revised1_.doc
data/osc/2016/FY2016-16-29%20DI-14-5128-16-29-DI-14-5218%20Agency%20Report/report.pdf has the following attachments: IMG_2190[1].JPG, IMG_2191[1].JPG, IMG_2189[1].JPG, IMG_0780.JPG, IMG_2193[1].JPG, IMG_0781.JPG, IMG_2192[1].JPG
data/osc/2012/FY2012-12-12d%20DI-11-2238%20and%20DI-11-2709%20-%20Supplemental%20Report/report.pdf has the following attachments: MS Comments in Blue.doc
data/peacecorps/2006/PC_South_Africa_Final_Evaluation-Report-IG-0702EA/report.pdf has the following attachments: Attachment Q.pdf, Attachment V - Early Funding Request 09-2005.pdf, Attachment F - GTOT RSA 06 Trainer.pdf, Attachment Z - 3 Walk-Around Personal Identification.pdf, Attachment X - 1 Courtship or Harrassment.pdf, Attachment W - IG Response Gene Peuse.pdf, Attachment O - D567D.pdf, Attachment S - vehfleetplan.pdf, Attachment T.pdf, Attachment BB - 5A Summary of Duties for APCD.pdf, Attachment P - vehiclemaintrecord.pdf, Attachment G - Housing_SS checklist RSA 07-6-2006.pdf, Attachment B - South Africa 170-06.pdf, Attachment K.pdf, Attachment L - MOU with FNB 08-2006.pdf, Attachment H - PCV Site Placement and Housing Checklist.pdf, Attachment CC - APCD Programming PA form 2003-2004.pdf, Attachment E - Weekly Self Assessments SA15.pdf, Attachment J - March 27 06 Minutes.pdf, Attachment D - EDUCATION - COMPETENCIES DRAFT.pdf, Attachment AA - 4 Are Rites Out of Step.pdf, Attachment C - South Africa 145-06.pdf, Attachment U - PCSA ARV-AB changes of COP Nov 2005.pdf, Attachment R - vehstareport.pdf, Attachment Y - 2 Accident.pdf, Attachment I - VAC Meeting Agenda 03-2006.pdf, Attachment A - TECHNICAL TRAINING PROGRAMME SA 15.pdf, Attachment N.pdf, Attachment M.pdf
data/dod/2011/SPO-2011-010/report.pdf has the following attachments: Final Report 100711.docx
data/smithsonian/2015/A-14-06/report.pdf has the following attachments: Transmittal Memo.pdf, KPMG Smithsonan A-133.pdf, DCAA Smithsonian A-133.pdf
data/nasa/2010/OMEGA-Report/report.pdf has the following attachments: OMEGA report FINAL Sept 20-v1.docx
data/nasa/2008/IG-09-006/report.pdf has the following attachments: Report_of_Independent_Auditors.pdf, Compliance_Report.pdf, Internal_Control_Report.pdf
data/nasa/2008/IG-09-006/report.decrypted.pdf has the following attachments: Report_of_Independent_Auditors.pdf, Compliance_Report.pdf, Internal_Control_Report.pdf
data/dot/2010/29584/report.pdf has the following attachments: quitecommands.xml
data/dot/2007/30096/report.pdf has the following attachments: Two Official in Bridge Division of NYDOT Charged in Bribery Scheme .doc
data/dot/2007/29992/report.pdf has the following attachments: New National Bridge Inspection Memo.doc
data/dot/2012/29073/report.pdf has the following attachments: FHWAARRA_FinalReport_4-5-12_CLee_MHchanges.docx
konklone commented 7 years ago

Agreed on all counts!

divergentdave commented 7 years ago

This will also be needed for "PDF portfolios" such as this https://www.si.edu/Content/OIG/Audits/2015/A-14-06.pdf.

According to https://blogs.adobe.com/pdfdevjunkie/2008/09/how_do_you_deal_with_large_pdf.html, "To maintain backward compatibility, a PDF Portfolio is basically a PDF with a bunch of attachments and some extra stuff in the catalog object."

Edit: https://oversight.garden/reports?query=%22PDF+portfolio%22