sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

HOTFIX: Handle messy links in Alameda news titles #141

Closed Mr0grog closed 4 years ago

Mr0grog commented 4 years ago

The Alameda news page now has links that take up only part of the title. The parser sometimes identifies these as links to language-specific versions of the news item, which causes it to stop parsing the title and start looking for language links. However, these links aren't language links, and we need to continue parsing the title as we move through them.

For example, the Alameda news page currently has this entry:

Screen Shot 2020-10-09 at 9 31 01 AM

…which was causing us to output the title:

Guidance on How to Celebrate Halloween and D

…instead of:

Guidance on How to Celebrate Halloween and Día de los Muertos Safely

This fixes the issue by being much more specific about what we expect for the text of a language link. This may have issues if there are ever typos in the language names, but seems to work well enough.

Fixes #140.

Mr0grog commented 4 years ago

Merging without review since this is a hotfix.