nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Incorrect new daily cases in San Bernardino county, California, USA #662

Closed Saurabh0027 closed 2 years ago

Saurabh0027 commented 2 years ago

Incorrect number of daily new cases on 6th March 2022: 0 cases Incorrect number of daily new cases on 7th March 2022: 718 cases

As per The OW of San Bernardino county, data on 6th March daily new cases are 0 and on 7th March are 450. Source :

https://covid19-sbcph.hub.arcgis.com/pages/cases

As per the Official California State Government Website, the number of daily new cases on 6th March and 7th March are 6 and 0. Source: https://covid19.ca.gov/state-dashboard/#location-san_bernardino

https://data.chhs.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/046cdd2b-31e5-4d34-9ed3-b48cdbc4be7a (search > San Bernardino > page number 77)

However NYT says 0 for 6th March & 718 for 7th march.

tiffehr commented 2 years ago

Our methodology is to include cases on the date they are added to the location's total cumulative figures, with the new cases each day calculated as the difference day-to-day in the cumulative tally. We did not see a new announcement from San Bernardino's ArcGIS/ESRI dashboard between 3/4 (567,082 cases) and 3/7 (567,532 cases) which is +450, so the value for 3/5 and 3/6 would be 0/no-change. 3/8 had fresh data but it was exactly +1 new death, no new cases. Why they reported a new death and no new cases we don't know.

San Bernardino's staff may have backdated cases for those days in their timeseries or histogram displays for ~3/5-3/7, after the fact. However, our collection method (and its ongoing complexity) can't track country-wide backdating activity except for extreme jumps, which we mark as an anomaly in our moving average. That's partly why we emphasize the rolling average figures rather than a day-to-day histogram -- it smooths out gaps in when a given health department reported increases.

California the state is possibly following the same mechanism we are, but we don't know for sure. And they get their reports from the county with some lag. So my educated guess is that they aren't showing San Bernardino's backdating either; perhaps they take the increase on the day they get the data rather than whatever San Bernardino says it happened.

Reconciling these kinds of geographic reporting differences vs. sums has been a challenging part of this project. I hope this explanation helps.