nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Enhancement: RE "The Trouble With the Case Curve During the Holidays" #643

Closed artgoldberg closed 2 years ago

artgoldberg commented 3 years ago

Describe the feature you would like to see

Dear New York Times Covid Data Team

First, thank you for the great work you've been doing for almost 2 years now. I started the pandemic by building a tiny analysis of New York City and State trends for myself (coronavirus_NY.xlsx, visualized below) which led me to decide to work from home well before the 03/17/20 shutdown in NYC. And since then I've used your compiled data and analyses to closely follow trends in NYC, the US and around the world. Thank you!

goldberg_early_covid_analysis

I appreciate your effort to adjust the smoothed multi-day averages of covid data you report in The Trouble With the Case Curve During the Holidays. But I recommend that you consider making more substantial changes. As you observe, the problem is that changes in behavior by Americans before, during and after holidays and vacations greatly disrupt the methods that estimate new cases and their regional and demographic distributions. In addition, as you mention changes in policies and practices of state and local public health organizations also disrupt these methods.

Unfortunately, the changes you're making only smooth out distortions that last a few days or less, and do not address the significant distortions that last a week or two. The illustrations in your article document this.

I would like to see a Covid-19 curve-smoothing algorithm that better fits detailed estimates of the actual case load.

Describe alternatives you've considered

You could go further and employ one of the well-known mathematical and algorithmic approaches for smoothing out longer duration distortions. These include:

  1. Average over a baseline that's longer than 2 weeks
  2. Use an Exponential moving average instead of a Simple moving average
  3. Use a band pass filter
  4. Build and train a Machine Learning model that incorporates more details about the reported and best estimates of actual cases, which would be much more work than the above approaches

Regards Arthur

albertsun commented 3 years ago

Thanks for the suggestions @artgoldberg, we'll certainly consider them.

We try to be very conservative with the adjustments and changes we make to our data to be sure that we are not over-smoothing or removing any real trends in an effort at smoothing, and leaving it to readers and other data users to interpret deeper trends. Hopefully with the raw data available others can do more advanced smoothing and modeling techniques as well.

tiffehr commented 2 years ago

@artgoldberg Please feel free to reopen this if you have further discussion points. Closing for now, since @albertsun relayed our decision-making approach to holiday irregularities.

artgoldberg commented 2 years ago

Hi @albertsun and @tiffehr I appreciate you thinking about my ideas and adding my suggestion to your Collections. Arthur