Closed pjsier closed 3 years ago
Reference the old spider here
I'll take this issue. I have a simple proof-of-concept script written that just gives us each event's title and time. The original spider appears to point to an older version of the site. I should have a PR ready for this by the end of May.
@ben-nathanson, can you collab with @guesschess on this?
Sure! We've gone over some existing code, discussed some of the technical quirks of navigating the port authority site, and will meet up as needed to work on this.
@guesschess is making good progress on this - we're at a working scraper script, the next steps are isort/flake8 formatting and writing test methods.
Please note that the website has changed to: https://www.portauthority.org/inside-Port-Authority/Port-Authority-Board/Board-Meeting-Information/
This change combined the events and documents into one nice comprehensive page
@ben-nathanson What do we do about the end time? We don't know how long it will take.
If we don't have any specific information about the end time, and I think that is the case here, I would add three hours to the start time
(Edit): See below. Thanks Bonnie!
@ben-nathanson Thanks!
@guesschess you can leave the end time as None
if there no end time. Glad you all are making progress!
I will have a look at this one.
I think I'm finished. I have created a pull request 🚍🚍🚍🚍🚍🚍🚍🚍
Haven't heard from Eva on this since June. We may want to turn this over to someone else--possibly using Eva's existing code.
Haven't heard from Eva on this since June. We may want to turn this over to someone else--possibly using Eva's existing code.
Sounds good to me.
Spider Name:
alle_port_authority
Website:
https://www.portauthority.org/inside-Port-Authority/Port-Authority-Board/Board-Meeting-Information/
Scraping Notes:
This is a rewrite of the
alle_port_authority
scraper now that the site has been updated