nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Please add the size of the population for each county / entity listed #75

Closed mosblitz closed 4 years ago

mosblitz commented 4 years ago

What would make this data even more useful is if you provided the size of the population per county / entity in this list. That way folks could translate the raw case numbers into population incidence percentages. Overall that is a better way for someone to get a sense of how their community is being affected by Covid-19 than the case numbers. If I live in a county of population 2,500,000 and 100 Covid-19 cases I would feel safer than living in a county of population 50,000 and 20 cases.

covid19viz commented 4 years ago

Please check this Interactive Visualization, that uses US counties confirmed cases and population data: https://covid19viz.github.io

Spationaute commented 4 years ago

It is a nice work @covid19viz !

mosblitz commented 4 years ago

Thanks for the link. It would be nice to present incidence data as a heat map for the USA - the red areas would have the highest incidences per 100,000 population. It would also be interesting to look at how growth rates evolve when a region passes a given incidence threshold as opposed to first case threshold.

mosblitz commented 4 years ago

I wonder if there is some systematic way to adjust this data by a factor that guesses the number of true cases versus measured and confirmed cases. Clearly in some areas not much testing has occurred and we are flying blind. Is there some way to determine the number of tests given so far per unit of population?

covid19viz commented 4 years ago

Yes it is (in Italian, but references linked here should be in English, and there is Google translate anyaway):

https://www.infodata.ilsole24ore.com/2020/03/29/covid-19-limiti-la-comprensione-dei-dati-giorno-comunica-la-protezione-civile/?refresh_ce=1

On Sun, Mar 29, 2020, 21:22 mosblitz notifications@github.com wrote:

I wonder if there is some systematic way to adjust this data by a factor that guesses the number of true cases versus measured and confirmed cases. Clearly in some areas not much testing has occurred and we are flying blind.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/75#issuecomment-605678384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO4BGWNGWMUQLQ6BVJ3TL2TRJ6GXZANCNFSM4LWALZFA .

adrianhamins commented 4 years ago

What would make this data even more useful is if you provided the size of the population per county / entity in this list.

You can do that yourself by merging FIPS population data with this dataset.

https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html

lahtnesornod commented 4 years ago

What about independent cities that are not in counties? How are they accounted for in the county-level dataset. For example, in Alabama the FIPS code 01000 seems to sweep in 100s of towns. See this link for county-level FIPS code info. https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html Do I take the state-level total and then subtract the sum of the county-level total and assume that the difference is accounted for by independent cities?

gillesleroy commented 4 years ago

If anyone is interested I am publishing the nytimes data with county population from 2010 census at https://www.topappsolutions.com/top/app/covid_19_per_counties.html Your comments/ideas/suggestions are welcome

lahtnesornod commented 4 years ago

gillesleroy, nice job appending the population data and showing per capita incidence. Suggest that you put the state-level average at the top of each table as a point of reference for the county-level results. Also, the Virginia table incorrectly also shows West Virginia.

rajahornstein commented 4 years ago

Gillesleroy, useful. I spent a while trying to do the same. Would be nice to see the same setup just for the states.

smrgit commented 4 years ago

Hi all, I've been doing some similar analysis and making use of Google BigQuery and their public census datasets... it would be great (and very cheap) to include this data in Google BigQuery tables. I'd be happy to help with that. (I'm a fan of BigQuery, but not officially affiliated with Google.) https://cloud.google.com/bigquery/public-data

gillesleroy commented 4 years ago

Thanks for the prompt feedbacks. I fixed the Virginia issue and will add state level stats shortly.

smrgit commented 4 years ago

btw, @gillesleroy you should be able to find 2018 county-level census data... that's what I have been using

albertsun commented 4 years ago

Yes, we'd suggest people take @adrianhamins method and merge in their choice of county level population data sets.

@lahtnesornod to your question about independent cities. We've reconciled data for independent cities into counties in all cases we've found where the city reports cases separately from the county AND the county lies wholly within the county. You should expect county counts to include the city unless you see the city broken out (for instance, Kansas City, MO). Any difference in county sums and state total is not due to independent cities but to cases which the state has not assigned to a county yet and is still investigating to determine the county of residence for the patient.

MaxGhenis commented 4 years ago

Specifically you'll want to use this file: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv

Here's a Python notebook where I merge its 2019 population data to the NYT state and county Covid files. If you just want the merged files, you can check the box at the bottom of the notebook and then Runtime...Run all.

gillesleroy commented 4 years ago

Thanks Max.

What a surprise to get an answer 2 months after the closure of the comments thread :) It turns out I am using the same source for the population per county i.e: census 2019 estimates. You can check it out here https://www.topappsolutions.com/top/app/covid.us https://www.topappsolutions.com/top/app/covid.us Click the state twice to get the data per county. I am using a relational database to load and manipulate the data but thanks for the offer to use your notebook.

Still working on the web site and refreshing data daily. Also I am now using google map for data mapping.

Your help is greatly appreciated.

Cheers,

Gilles Leroy

On May 28, 2020, at 9:59 PM, Max Ghenis notifications@github.com wrote:

Specifically you'll want to use this file: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv Here's a notebook https://colab.research.google.com/drive/1YCZDnCrTZzxsONuIFO0xlOcUUTgM1lTK?usp=sharing where I merge its 2019 population data to the NYT state and county Covid files. If you just want the merged files, you can check the box at the bottom of the notebook and then Runtime...Run all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/75#issuecomment-635759556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJYHS2P25KETOA2WDYVZW2LRT46JPANCNFSM4LWALZFA.