The Sonoma County data scraper returns an incorrect update_time value. For example, running it now returns the timestamp 2021-03-19T20:39:28+00:00, which is clearly incorrect since it includes much more recent data.
If I recall correctly, it used to be pretty apparent that the page was manually updated with data. What probably happened is that they replaced the manually entered data tables with something that dynamically populates that portion of the page, so the page text in the CMS hasn’t been updated since March even while the data is getting dynamically updated.
We need to determine whether something else on the page gives us an accurate update timestamp. If not, we should just use the current time for update_time, like we do for other scrapers that can’t obtain an accurate time from the page they’re scraping.
The Sonoma County data scraper returns an incorrect
update_time
value. For example, running it now returns the timestamp2021-03-19T20:39:28+00:00
, which is clearly incorrect since it includes much more recent data.It looks like we get the time by checking the dashboard page’s
<meta>
elements: https://github.com/sfbrigade/data-covid19-sfbayarea/blob/40a9779552d21803faef5a31be2ad129d9a13c9f/covid19_sfbayarea/data/sonoma.py#L69-L79If I recall correctly, it used to be pretty apparent that the page was manually updated with data. What probably happened is that they replaced the manually entered data tables with something that dynamically populates that portion of the page, so the page text in the CMS hasn’t been updated since March even while the data is getting dynamically updated.
We need to determine whether something else on the page gives us an accurate update timestamp. If not, we should just use the current time for
update_time
, like we do for other scrapers that can’t obtain an accurate time from the page they’re scraping.