Open skorasaurus opened 7 years ago
1] downloading the PDFs 2] parsing them to the raw text (https://github.com/opencleveland/drocer/blob/master/records.py ) in proper directories.
setting up an AWS account (or something else) to run this every so often.
Yes. I need to (more fully) document and present the PDFBox extensions that are now doing a better job of producing raw text in corrected order.
1] downloading the PDFs 2] parsing them to the raw text (https://github.com/opencleveland/drocer/blob/master/records.py ) in proper directories.
setting up an AWS account (or something else) to run this every so often.