nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Recent County-level Rolling Avg Data Not Updating #632

Closed ijconlon closed 3 years ago

ijconlon commented 3 years ago

I saw the recent change to the rolling-average data feed for counties due to size limitations, but the us-counties-recent.csv file has not been updated since September 30. That said, is the idea that this data feed should contain data for the last 30 days, on a rolling basis? If so, I wonder what will happen with the earlier days, since they likely wouldn't end up in that large original us-counties.csv file.

Could a possible workaround here be to just have a second county-level dataset with a start date of September 1, 2021? If people want to report on data earlier than that, they can append the data to the original us-counties.csv file.

Thanks for all you do here--it's really a fantastic resource!

AKValle26 commented 3 years ago

Looks like they fixed it

ijconlon commented 3 years ago

Ah, I see that the recent data feed is now updating, but take a look at the archived data set (us-counties.csv). The last date of data in that source is September 29. If the us-counties-recent.csv data source continues to update on a rolling 30-day basis, we'll eventually get to a point where a growing number of days won't appear in either data source.

Seems like a better approach than having a recent rolling dataset would be to have calendar year extracts ("us-counties-.csv")--so the 2020 dataset would cover January 21,2020 (earliest available date) to December 31, 2020, the 2021 would cover all days in that year, and so forth. No dataset would get so large as to become unwieldy and users could choose how far back they want to go. If they wanted to include a separate dataset that only includes the most recent 30 days, that could be an option as well.

wmandrews commented 3 years ago

Thanks for creating this issue, and great idea @ijconlon. We have updated the county-level data in the rolling-averages directory to have year-based files available to create a full-pandemic dataset. https://github.com/nytimes/covid-19-data/tree/master/rolling-averages

ijconlon commented 3 years ago

Fantastic! Glad you liked the idea.

Judging from your avatar, I'm thinking you must be a fellow Tar Heel. I live down in Carrboro. Go Heels! And thanks again!