signaturescience / focustools

Forecasting COVID-19 in the US
https://signaturescience.github.io/focustools/
GNU General Public License v3.0
0 stars 0 forks source link

summing all counties should (nearly?) equal US data #1

Closed stephenturner closed 3 years ago

stephenturner commented 3 years ago

Originally, US level data was compiled by summing over counties, but this code doesn't result in correct numbers: https://github.com/signaturescience/focustools/blob/f5ea1b9024c4864ee007337cefc25654d9c8469c/scratch/mars.R#L16-L44

Reading in the pre-summarized US data is much faster, and yields correct counts: https://github.com/signaturescience/focustools/blob/f5ea1b9024c4864ee007337cefc25654d9c8469c/scratch/mars.R#L46-L71

Speed is convenient, but we should probably consider figuring out how to do this correctly starting from the county data. If later we're going to do analyses at the state and county level, we should ensure that we can get the same counts after summing over counties as we get by reading in the pre-summed data, regardless of whether we do this with NYT or some other data source.

(Let's not reinvent a wheel if this has already been solved by some other C19FH participant in their utils folder or whatever).

stephenturner commented 3 years ago

added a script to get US data the "second way" in utils/ https://github.com/signaturescience/focustools/blob/951263c0b244ecd267472c397a81d0b52f112585/utils/get-us-data.R

stephenturner commented 3 years ago

Closing this and moving as a comment/addition to #3