The SF news page markup no longer surrounds the summary/abstract text in an element, and the scraper was failing when trying to find it with errors like:
San Francisco county failed: 'NoneType' object has no attribute 'get_text'
Traceback (most recent call last):
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/scraper_news.py", line 82, in main
run_county_news(county, from_, format, output)
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/scraper_news.py", line 39, in run_county_news
feed = news.scrapers[county].get_news(from_date=from_)
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/base.py", line 80, in get_news
return instance.scrape()
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/base.py", line 58, in scrape
news = self.parse_page(html, self.URL)
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 43, in parse_page
return [self.parse_news_item(article, base_url)
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 43, in <listcomp>
return [self.parse_news_item(article, base_url)
File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/news/san_francisco.py", line 61, in parse_news_item
summary = normalize_whitespace(item.find(class_='__abstract').get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'
We now just use all the text in the news item besides the heading and the date.
The SF news page markup no longer surrounds the summary/abstract text in an element, and the scraper was failing when trying to find it with errors like:
We now just use all the text in the news item besides the heading and the date.