nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Data Issue: San Francisco erratic data trends, negative numbers #606

Closed durrettj closed 3 years ago

durrettj commented 3 years ago

Describe the issue:

Fuller details

Cumulative case counts for San Francisco dropped by more than 300 between 6/29 and 6/30.

2021-06-29 37375 2021-06-30 37044 2021-07-01 37068 2021-07-02 37110 2021-07-03 37110 2021-07-04 37110 2021-07-05 37110 2021-07-06 37240 2021-07-07 37254 2021-07-08 37287 2021-07-09 37341 2021-07-10 37341 2021-07-11 37341 2021-07-12 37599 2021-07-13 37659

This causes erratic rolling averages:

2021-06-30 -331 2021-07-01 24 2021-07-02 42 2021-07-03 0 2021-07-04 0 2021-07-05 0 2021-07-06 130 2021-07-07 14 2021-07-08 33 2021-07-09 54 2021-07-10 0 2021-07-11 0 2021-07-12 258 2021-07-13 60

lwaananenjones commented 3 years ago

Thank you for asking about this. Our data for San Francisco County comes from the state health department. (We also monitor the county dashboard, but it has been consistently lower recently.) On June 30, California removed many cases and had this note:

On June 30, there were 2013 newly reported confirmed cases of COVID-19 based on June 29 data. June 30's report includes 6372 fewer overall confirmed COVID-19 cases due to a routine audit of data that resulted in the removal of duplicate and reclassified cases.

We have anomalies this like one omitted from our rolling average and listed in the anomalies file with our relatively new rolling averages dataset.

durrettj commented 3 years ago

Thank you for the explanation and pointing out the anomalies file.

Best regards,

--

Jason Edward Durrett, GSTRT, GCUX, GCIA, GIAC Advisory Board

On Wed, Jul 14, 2021 at 05:37:43PM -0700, lwaananenjones wrote:

Thank you for asking about this. Our data for San Francisco County comes from the state health department. (We also monitor the county dashboard, but it has been consistently lower recently.) On June 30, California removed many cases and had this note:

On June 30, there were 2013 newly reported confirmed cases of COVID-19 based on June 29 data. June 30's report includes 6372 fewer overall confirmed COVID-19 cases due to a routine audit of data that resulted in the removal of duplicate and reclassified cases.

We have anomalies this like one omitted from our rolling average and listed in the anomalies file with our relatively new rolling averages dataset.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/nytimes/covid-19-data/issues/606#issuecomment-880300406