nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

The data for Missouri on 03/08/2021 seems to have an abnormally high number of daily cases, particularly in the larger counties (over 50,000 on that date). #556

Closed SFnLS closed 2 years ago

SFnLS commented 3 years ago

Describe the issue:

Fuller details

A clear and concise description of the problem, with examples if possible. If you are reporting incorrect data for a specific locality, please include a link to your source. We will compare with our own list of sources. Please check the README and recent commit messages for updates to see if your issue is addressed.

lwaananenjones commented 3 years ago

Missouri began reporting probable cases based on antigen testing on 3/8, which resulted in a large one-day increase, particularly in counties that had not previously reported the probable cases they were tracking internally. There is a note about this anomaly on our Missouri and U.S. tracking pages, and we are working on a process for flagging anomalous values in our data here.

SFnLS commented 3 years ago

Thanks for the info. Leave it to my backwoods third world state to introduce a major change into reporting one year into the pandemic and dump it all onto one day which renders things like trend analysis just about useless.

Not so coincidently that caused a big spike in the daily US cases.

Keep up the great work. Maybe by later this year things won’t be as important to track.

On Mar 10, 2021, at 3:25 PM, lwaananenjones @.***> wrote:

Missouri began reporting probable cases based on antigen testing on 3/8, which resulted in a large one-day increase, particularly in counties that had not previously reported the probable cases they were tracking internally. There is a note about this anomaly on our Missouri https://www.nytimes.com/interactive/2020/us/missouri-coronavirus-cases.html and U.S. tracking pages, and we are working on a process for flagging anomalous values in our data here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/556#issuecomment-796140136, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFQ7MYWGCRHS3FVTW6L6F3TC7ITVANCNFSM4Y6P33QA.

PatrickDRusk commented 3 years ago

Your "backwoods third world state" provides a dataset with date-of-death statistics in it that is quite good: https://results.mo.gov/t/COVID19/views/COVID-19DataforDownload/MetricsbyDateofDeath.csv

If you are using deaths by date of report (as most news organizations do), your trend analysis will be hopelessly wrong. For instance, you would think that Virginia has had high rates of death over the last couple weeks, because they are doing an audit catching up on deaths in early January. Quite a number of states have been performing audits since the new year.

For accurate death numbers, use date-of-death datasets (I get the data daily for 25 states), and current hospitalizations data. The hospitalizations data is, by far, the best stat to use for trend analysis. Unfortunately, the COVID Tracking Project has just shut down, so we are dependent on the CDC dataset that is only updated weekly.

Some FB posts documenting my journey towards using date-of-death and hospitalization data: https://www.facebook.com/SirMungus/posts/10165020170935227 https://www.facebook.com/SirMungus/posts/10165050364645227 https://www.facebook.com/SirMungus/posts/10165141194435227

SFnLS commented 3 years ago

Mostly been tracking cases in counties around the country and certain states where I have business connections. Several times states have changed what they reported and when, but it seems like the Missouri case dump was particularly notable in its volume as 50,000 cases must have stretched back a bit. Very suspicious about the reason the antigen test data had been previously withheld from reporting.

Thanks again.

On Wed, Mar 10, 2021 at 3:53 PM Patrick Rusk @.***> wrote:

Your "backwoods third world state" provides a dataset with date-of-death statistics in it that is quite good: https://results.mo.gov/t/COVID19/views/COVID-19DataforDownload/MetricsbyDateofDeath.csv

If you are using deaths by date of report (as most news organizations do), your trend analysis will be hopelessly wrong. For instance, you would think that Virginia has had high rates of death over the last couple weeks, because they are doing an audit catching up on deaths in early January. Quite a number of states have been performing audits since the new year.

For accurate death numbers, use date-of-death datasets (I get the data daily for 25 states), and current hospitalizations data. The hospitalizations data is, by far, the best stat to use for trend analysis. Unfortunately, the COVID Tracking Project has just shut down, so we are dependent on the CDC dataset that is only updated weekly.

Some FB posts documenting my journey towards using date-of-death and hospitalization data: https://www.facebook.com/SirMungus/posts/10165020170935227 https://www.facebook.com/SirMungus/posts/10165050364645227 https://www.facebook.com/SirMungus/posts/10165141194435227

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/556#issuecomment-796184819, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFQ7M773P656DB3KE3GNGTTC7L45ANCNFSM4Y6P33QA .

PatrickDRusk commented 3 years ago

I gave up on case data probably six months ago. The reporting varies so much state-by-state, and it is affected by so many variables that don't track with actual infections (like weekend, holidays, storms, reporting delays, criteria changes). And states very on how they report, for instance, if one person takes both a PCR and antigen test and both are positive. Does that count as two, or one? The CTP had various blog posts about the problems involved.

One the light dawns about how much better hospitalization data is for trend analysis, you'll never go back to cases or reported deaths. Personally, when I figured it out, I was both joyous and regretful, because it nullified hundreds of hours of death analysis that I had done.

LOL, I just realized I am saying this on the site where I have gotten my cases and reported deaths data for six months! I am extremely grateful to all of the folks at NYT that have put together this data and made it available.