ward-wise / data-analysis

Data analysis on Chicago infrastructure and infrastructure spending
MIT License
3 stars 7 forks source link

Scrape data from Chicago Aldermanic menu program PDFs #2

Closed smacmullan closed 6 months ago

smacmullan commented 1 year ago

CIP Archive - Previous Aldermanic Menu Program Books by Year section

Yes, someone has already asked the city if this data is available in a CSV file. It is not.

Write Python functions to convert the text in the PDFs to a CSV format. You can do OCR with pytesseract, but the you might be able to get the text directly out of the PDF file using PyMuPDF or something other library. The PDFs for different years have different formats.

smacmullan commented 6 months ago

Closing this. Scraped data available here: https://github.com/JohnCRuf/alderman_machine/tree/master/tasks/data_geocode_menu/output