Open waldoj opened 10 years ago
This works fine: pdftohtml -f 1 -stdout bill.pdf | recode html..utf8
I set up external_legislation.sh
within /utilities/
, to iterate through PDFs in a given directory and return a list of those that contain the text PREPARED
.
Got it. Works great. So far this is just a batch utility, and it works fine. Now to turn this into a little service.
The PDF of legislation that was written by a third party is flagged as such. For instance, this bill contains this text at the top:
The process is easy:
sudo apt-get install poppler-utils
).recode html..utf8
)LEGISLATION NOT PREPARED BY DLS
(keeping in mind that those spaces are no-break spaces—that is, use regex's\s
, not a literal space.)