sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

HOTFIX: Some Contra Costa headings aren't headings #200

Closed Mr0grog closed 3 years ago

Mr0grog commented 3 years ago

The Contra Costa news scraper recently stopped picking up a lot of news stories because some of the month headings on the news page are no longer actual headings (instead they are a <p> with several nested <span> inside it). The new code still relies on some of them being properly marked up as headings, but not all. Instead of looking for the list of news items after each month heading, we find the parent elements of the headings, and look for all list items in them.

Mr0grog commented 3 years ago

At the time of writing, you can see the issue in https://github.com/sfbrigade/stop-covid19-sfbayarea/pull/1116 (it may have changed by the time someone reads this, though).