openoakland / disclosure-alert

Prototype of email alerts whenever campaign finance disclosures come in
https://alert.opendisclosure.io
2 stars 0 forks source link

Investigate pulling Lobbyist Quarterly Report PDF data #31

Open tdooner opened 4 years ago

tdooner commented 4 years ago

The Oakland PEC is updating their process for Lobbyist Disclosures to be a digital process. The filings in Netfile for the new process look like this:

https://netfile.com/Connect2/api/public/image/189594727

That document is a PDF with fillable fields containing the values for the person. The values can be extracted with the pdftk command like this:

» wget -O189594727.pdf https://netfile.com/Connect2/api/public/image/189594727
» pdftk 189594727.pdf dump_data_fields | grep FieldValue
FieldValue: Reynaldo A. Fuentes
FieldValue: The Partnership for Working Families
FieldValue: 1305 Franklin St Suite 501
FieldValue: Oakland, CA 94612
FieldValue: (510) 925-4013
FieldValue: rey@forworkingfamilies.org
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Emergency Paid Sick Leave
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Paid Sick Leave Enforcement
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Department of Workplace and Enforcement Standards
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Department of Workplace and Enforcement Standards
FieldValue: April 26, 2020
FieldValue: Choice2
FieldValue: Choice3
FieldValue: Support
FieldValueDefault:
FieldValue: Support
FieldValueDefault:
FieldValue: Policy Development
FieldValueDefault:
FieldValue: Informational Briefing
FieldValueDefault:
FieldValue:
FieldValueDefault:
FieldValue:
FieldValueDefault:

Is there an easy way to get this data out of the PDF with ruby? Maybe a gem that wraps PDFtk?