nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Enhancement: Early warning system #336

Closed Pedro-McM closed 4 years ago

Pedro-McM commented 4 years ago

I'm a big fan of the New York Times Covid-19 database, but unfortunately, the "cases" data isn't very useful. It consistently underestimates how many infections there are (and this was particularly the case early on in the epidemic when testing wasn't readily available). And now that testing is much more readily available, it is difficult to draw useful conclusions from the number of new cases that occur every day because there is no way to determine whether an increase in cases is the result of more infections or more testing. Frankly and unfortunately, this makes the "cases" data almost useless.

As a prime example, I've attached two graphs created using NYT data for California. One shows a 7-day moving average of new deaths and the other shows a 7-day moving average of new cases. We can see that new deaths peaked around the last week of April. And yet, new cases have been trending upward during the entire epidemic. In fact, since the end of April, new deaths have decreased by around 25% while new cases have increased by almost 200%!

So basically, I can't reach any worthwhile conclusion by looking at the "cases" data, except perhaps that it is likely that California is testing people much more than previously.

I know this would be difficult to do, but I think the NYT data set would be much more useful for understanding the Covid-19 epidemic in the United States if a different early warning number were included. I can think of several possibilities: total number of tests performed; percentage of beds utilized by covid-19 patients; ventilator use. Perhaps there are others.

California mvg avg new cases California mvg avg new deaths

tiffehr commented 4 years ago

We have no plans to pull in hospitalization or testing figures. The value of our data is the case/death timeseries and trends, as you and many others have charted. We do not have the ability to backdate data we have not been collecting, nor have the staffing to try to reconstruct those histories even if it were easily collected.

We recommend the COVID Tracking Project's data for some of the data you request. But I think it is worth considering that that kind of data is not widely available nor methodologically clear from state to state or county to county. Other sources are collecting those, with their own large methodological concerns and caveats.

Pedro-McM commented 4 years ago

Thanks. Unfortunately, that's what I thought you would say. I understand the problem. I was just hoping!

Pedro-McM commented 4 years ago

And I'll check out that website you mentioned. It looks promising!