Closed stephenturner closed 3 years ago
added a script to get US data the "second way" in utils/
https://github.com/signaturescience/focustools/blob/951263c0b244ecd267472c397a81d0b52f112585/utils/get-us-data.R
Closing this and moving as a comment/addition to #3
Originally, US level data was compiled by summing over counties, but this code doesn't result in correct numbers: https://github.com/signaturescience/focustools/blob/f5ea1b9024c4864ee007337cefc25654d9c8469c/scratch/mars.R#L16-L44
Reading in the pre-summarized US data is much faster, and yields correct counts: https://github.com/signaturescience/focustools/blob/f5ea1b9024c4864ee007337cefc25654d9c8469c/scratch/mars.R#L46-L71
Speed is convenient, but we should probably consider figuring out how to do this correctly starting from the county data. If later we're going to do analyses at the state and county level, we should ensure that we can get the same counts after summing over counties as we get by reading in the pre-summed data, regardless of whether we do this with NYT or some other data source.
(Let's not reinvent a wheel if this has already been solved by some other C19FH participant in their utils folder or whatever).