sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

Sonoma County `update_time` is Incorrect #205

Closed Mr0grog closed 3 years ago

Mr0grog commented 3 years ago

The Sonoma County data scraper returns an incorrect update_time value. For example, running it now returns the timestamp 2021-03-19T20:39:28+00:00, which is clearly incorrect since it includes much more recent data.

It looks like we get the time by checking the dashboard page’s <meta> elements: https://github.com/sfbrigade/data-covid19-sfbayarea/blob/40a9779552d21803faef5a31be2ad129d9a13c9f/covid19_sfbayarea/data/sonoma.py#L69-L79

If I recall correctly, it used to be pretty apparent that the page was manually updated with data. What probably happened is that they replaced the manually entered data tables with something that dynamically populates that portion of the page, so the page text in the CMS hasn’t been updated since March even while the data is getting dynamically updated.

We need to determine whether something else on the page gives us an accurate update timestamp. If not, we should just use the current time for update_time, like we do for other scrapers that can’t obtain an accurate time from the page they’re scraping.