Open 1ec5 opened 4 years ago
The application cites Corona Data Scraper, which fetches figures from this spreadsheet by The Mercury News, which cites the Santa Clara County Public Health Department as its source.
As far as I can tell, Corona Data Scraper is simply fetching the latest day’s total case count and adding that as an entry in the database. That is an important statistic, but a time series based on it would be influenced by delays in testing.
In mid-April, the county changed its methodology to report historical case counts. Every day, they retroactively update as many as 40 past days to reflect how many tests were taken on a given day that later came back positive. This way the curve more accurately depicts the rate of (confirmed) infection over time. On the other hand, it can be tedious to keep track of so many numbers, and a couple dozen cases are undated at any given time and can’t be represented in the time series at all.
Which methodology is more appropriate for this application? I suppose consistency with other Bay Area counties is important for this website. It’s also worth noting that the county only provides historical data on case counts and not deaths, so can be misleading to combine the two time series in a single chart. On the other hand, there’s a lot of value in seeing an accurate curve. (Santa Clara County is flatter than this application indicates.)
Over on Wikimedia Commons, I’ve been tracking Santa Clara County’s outbreak using the county’s preferred methodology, updating this table to power this chart on Wikipedia and possibly elsewhere. This script automatically generates an updated table to copy-paste into Commons. Hopefully this script will be useful to the project. (Apologies in advance for the obtuse jq
usage.)
Same thing for San Francisco: covidatlas/coronadatascraper#1011.
The Santa Clara County bar chart under the Stats tab displays daily and total figures that differ from the official Santa Clara County Public Health Department dashboard.
Steps to reproduce
Expected behavior
I’d expect the two sources to match, assuming the Santa Clara County Public Health Department and CalREDIE are the ultimate source of this data. Otherwise, if the data is coming from another source, it would be great if that source were easier to identify.
Screenshots
stop-covid19-sfbayarea:
Santa Clara County Public Health Department: