openstates / issues

Having trouble? Looking to contribute? Issues live here!
15 stars 2 forks source link

PA: new site broke scraper #1256

Closed jessemortenson closed 1 week ago

jessemortenson commented 1 week ago

Looks like the new PA site keeps changing, as at least a couple things have been broken on the scraper already

Please use this branch to work on a fix: https://github.com/openstates/openstates-scrapers/tree/pa-fix-more-new-site-bugs

Bills not scraping at all

Currently bills are not scraping at all with a ton of this error

21:00:00 WARNING openstates: Skipping HB 106 https://www.palegis.us/legislation/bills/2023/hb106, No title found
21:00:00 INFO scrapelib: GET - 'https://www.palegis.us/legislation/bills/2023/hb107'

Looks like something must have changed with the markup. Did a quick look at the source code and the fix is not immediately apparent to me. Looks like a semi-complicated xpath used to find the string in the first place. Probably would be good to have a comment describing it as well! wow the source code on the new PA site sucks

Committee Vote URLs

Scraping committee votes broke. In the main branch I temporarily disabled it to get the scraper working again. I have a "real" fix in a branch https://github.com/openstates/openstates-scrapers/tree/pa-fix-more-new-site-bugs that I think will work, but the first issue needs to be fixed first to test it.

braykuka commented 1 week ago

@jessemortenson I will work on it first.

braykuka commented 1 week ago

@jessemortenson Please review it.