pgh-public-meetings / city-scrapers-pitt

Pittsburgh City Scrapers: sourcing public meetings in Pittsburgh
https://pgh-public-meetings.github.io/events/
MIT License
19 stars 66 forks source link

Spider: Pittsburgh Zoning Board of Adjustment #8

Open pjsier opened 5 years ago

pjsier commented 5 years ago

Spider Name:

pitt_zoning

Website:

http://pittsburghpa.gov/dcp/zba-schedule

Scraping notes:

Meeting info is all in PDF

danwarren commented 5 years ago

I'll take a crack at this one today, the PDFs look to be easily parsed out by pdfminer.six

maxachis commented 3 years ago

So I'll need some clarification on how we want to go about with this one, @ben-nathanson @bonfirefan.

Do we want to scrape every individual meeting? Because with what looks like at least 10 meetings per day, that could clog up the gears of City-scrapers by quite a bit. Or do we want to simply list the first one of them, which is 9:00AM?

ben-nathanson commented 3 years ago

It's good that you're thinking about scale. In this example we will want to scrape every individual meeting.

I would also say that we're building on top of existing tools that were designed to handle significant workloads, so it should be a long time before we encounter issues around scaling.

maxachis commented 3 years ago

I think my concern is moreso that it might take up a lot of space on the calendar itself. Are we fine with having 10+ similar meetings listed on the same day in the calendar?

maxachis commented 3 years ago

@cheog was looking into this with certain python modules, and I'm looking into it with different python modules--in my case, PDFMiner.