Closed MirrorCola closed 2 years ago
Just don't forget those of us whose data processes don't use Excel. An additional form of listing data would be fine, but yanking the existing file and replacing it with two would break things for a lot of people. Maybe even better than adding more data formats might be offering a script (or a couple of them for different scripting environments) that can cut up the data in the main file on the user side.
Trajan, fair enough. Another thought that might solve both problems is to have an all-inclusive file and a separate 2021-only version. Would that work for you? I know it increases the work for NYT staff though.
I import the Covid-19 data into a Sqlite database.
On Sun, Feb 21, 2021, 1:18 PM MirrorCola notifications@github.com wrote:
Trajan, fair enough. Another thought that might solve both problems is to have an all-inclusive file and a separate 2021-only version. Would that work for you? I know it increases the work for NYT staff though.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/544#issuecomment-782901106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALDDKG6QAJN3P4ELRV6J53DTAFE5TANCNFSM4X643W7A .
We're working on what we'll do, here. Back to you soon.
I bulk the county file into a DB and then use Excel Data options to view selected portions. I'd like to see a continue complete file.
From: Tiff Fehr notifications@github.com Sent: Sunday, February 21, 2021 2:51 PM To: nytimes/covid-19-data covid-19-data@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [nytimes/covid-19-data] The Excel Line Limit is 1,048,576 and today the county file exceeded that. Any progress on splitting the file. One suggestion would be a 2020 file 'archive' and a 'live' 2021 addition. (#544)
We're working on what we'll do, here. Back to you soon.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnytimes%2Fcovid-19-data%2Fissues%2F544%23issuecomment-782914954&data=04%7C01%7C%7Ce2df9b74997b43daa99308d8d6a20f99%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637495338814589297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tF7yY0IMHOULkrA29kyRxDLLEv2LvX1Ah7ccxcNi71E%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABGFUYHOGTN775Q34PRBPOLTAFP3PANCNFSM4X643W7A&data=04%7C01%7C%7Ce2df9b74997b43daa99308d8d6a20f99%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637495338814589297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ftIwEXwjtl%2Fqo2%2BgxncEyiAia8j0cxjd1CJT4iOtdcI%3D&reserved=0.
I have written a python3 script that busts all the state and county rows into their own csv files. It creates a directory structure as follows:
your-working-directory/
states/
Alabama/
Alabama-state.csv
Autauga.csv
Baldwin.csv
Barbour.csv
etc...
Alaska/
Alaska-state.csv
Aleutians East Borough.csv
Aleutians West Census Area.csv
Anchorage.csv
etc...for each county in the state
etc... for each state or territory
See this repository to get the program. I just released it, so if you have questions, drop an issue on my repository. Hope it helps you all out.
One easiest solution is to go to https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv >> Right click on "View raw" >> click on "Save Link as" >> In the file explorer save the file as "us-counties.csv" (Add .csv extension to the name)and it will save the csv file in your folder. You can then use any statistical program to import this csv file from your local folders. Thanks
Please be a bit careful with the concept of 2020 data as an "archive". Just within the last two weeks there have been, at least, three sets of revisions that touched 2020 data. I think it was the Alaska change that resulted in revisions all the way back to May 2020. At least two other sets of revisions recently touched November and December 2020.
One easiest solution is to go to https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv >> Right click on "View raw" >> click on "Save Link as" >> In the file explorer save the file as "us-counties.csv" (Add .csv extension to the name)and it will save the csv file in your folder. You can then use any statistical program to import this csv file from your local folders. Thanks
One easiest solution is to go to https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv >> Right click on "View raw" >> click on "Save Link as" >> In the file explorer save the file as "us-counties.csv" (Add .csv extension to the name)and it will save the csv file in your folder. You can then use any statistical program to import this csv file from your local folders. Thanks
This does not seem to work as it still only allows viewing a portion of the data.
To get around the 1 million rows limit in Excel just import the CSV into Excel using the Microsoft built in (and free) Power Query module: https://www.masterdataanalysis.com/ms-excel/analyzing-50-million-records-excel/
gmillikan, Thank you, I am still fumbling with the Power Query, but got it to work for my purposes. Much appreciated!
One easiest solution is to go to https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv >> Right click on "View raw" >> click on "Save Link as" >> In the file explorer save the file as "us-counties.csv" (Add .csv extension to the name)and it will save the csv file in your folder. You can then use any statistical program to import this csv file from your local folders. Thanks
One easiest solution is to go to https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv >> Right click on "View raw" >> click on "Save Link as" >> In the file explorer save the file as "us-counties.csv" (Add .csv extension to the name)and it will save the csv file in your folder. You can then use any statistical program to import this csv file from your local folders. Thanks
This does not seem to work as it still only allows viewing a portion of the data.
Thank you - You are right Excel cannot even open this csv file but you can use another software like SAS, R or Tableau to import this data file and filter by dates or counties or states you need.
For the folks having trouble with Excel and Numbers, do you need all the data for all the counties at the same time?
It was sure nice to be able to pivot & drill at will earlier....
I load a mysql db, to be able to pre-calc new cases/deaths & rolling averages.
I split it into two workbooks now, a US level .xls & a state-level .xls. The state-level is driven by a parameterized query, so I can change the state at will & keep all the charts in sync.
The side benefit is I'm not working with a 100mb spreadsheet anymore! https://shawnbulen.com/forum/index.php?topic=4.0
I don't know if you saw my earlier comments, but would having each county separated into its own csv file be a help or cause you more work? Because that's what my program does.
It was sure nice to be able to pivot & drill at will earlier....
I load a mysql db, to be able to pre-calc new cases/deaths & rolling averages.
I split it into two workbooks now, a US level .xls & a state-level .xls. The state-level is driven by a parameterized query, so I can change the state at will & keep all the charts in sync.
The side benefit is I'm not working with a 100mb spreadsheet anymore! https://shawnbulen.com/forum/index.php?topic=4.0
Thanks all for all your feedback and discussion on this, and for sharing with each other different ways of handling the data. We do want most people who are using Excel and Numbers to be able to keep doing what they were doing before, so we've created a new us-counties-recent.csv file https://github.com/nytimes/covid-19-data/blob/master/us-counties-recent.csv that has data for just the last 30 days. Hopefully this is enough for people who are looking at recent trends.
The existing file will continue unchanged, and both will get updated together.
Thank you, that will work great for me.
And thanks to everyone else, I learned some new stuff in the dialogue.
Describe the issue:
Fuller details
A clear and concise description of the problem, with examples if possible. If you are reporting incorrect data for a specific locality, please include a link to your source. We will compare with our own list of sources. Please check the README and recent commit messages for updates to see if your issue is addressed.