The Contra Costa news scraper recently stopped picking up a lot of news stories because some of the month headings on the news page are no longer actual headings (instead they are a <p> with several nested <span> inside it). The new code still relies on some of them being properly marked up as headings, but not all. Instead of looking for the list of news items after each month heading, we find the parent elements of the headings, and look for all list items in them.
The Contra Costa news scraper recently stopped picking up a lot of news stories because some of the month headings on the news page are no longer actual headings (instead they are a
<p>
with several nested<span>
inside it). The new code still relies on some of them being properly marked up as headings, but not all. Instead of looking for the list of news items after each month heading, we find the parent elements of the headings, and look for all list items in them.