sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

HOTFIX: SF news no longer has markup for a summary #183

Closed Mr0grog closed 3 years ago

Mr0grog commented 3 years ago

The SF news page markup no longer surrounds the summary/abstract text in an element, and the scraper was failing when trying to find it with errors like:

San Francisco county failed: 'NoneType' object has no attribute 'get_text'
Traceback (most recent call last):
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/scraper_news.py", line 82, in main
   run_county_news(county, from_, format, output)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/scraper_news.py", line 39, in run_county_news
   feed = news.scrapers[county].get_news(from_date=from_)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/base.py", line 80, in get_news
   return instance.scrape()
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/base.py", line 58, in scrape
   news = self.parse_page(html, self.URL)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 43, in parse_page
   return [self.parse_news_item(article, base_url)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 43, in <listcomp>
   return [self.parse_news_item(article, base_url)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 61, in parse_news_item
   summary = normalize_whitespace(item.find(class_='__abstract').get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'

We now just use all the text in the news item besides the heading and the date.

Mr0grog commented 3 years ago

Gonna go ahead and merge this since it’s a hotfix.