Closed cwhittaker1000 closed 4 months ago
Specifically need to find US replacements for:
*R.e. workplaces, in the report we have written that Fergusson et al. 2005 is based on US data, but is this the form of the distribution, and then the parameters are fit to Thai data? Ferusson does cite back to "Axtell, R. L. Zipf Distribution of U.S. Firm Sizes. Science 293, 1818-1820 (2001)." It's unclear to me exactly what was done based on the SI.
(In some sense I have doubts about how important this is to the results of the analysis, though I can imagine if someone were looking to question results them saying "oh but you didn't use US specific data" would be a very easy complaint to make and that it's worth preempting that.)
Schools: https://nces.ed.gov/programs/digest/d21/tables/dt21_216.40.asp Workplaces: https://www.census.gov/data/tables/2019/econ/susb/2019-susb-annual.html see download "U.S. & states, NAICS, detailed employment sizes (U.S., 6-digit and states, NAICS sectors)" House/Age: I think we can use RTI SynthPop (https://www.rti.org/focus-area/rti-synthpoptm) - see here for more detail but basically they created a Synthetic US pop specifically for ABM stuff: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875687/. I've emailed them to ask for access, but we can access what we need here I think: https://fred.publichealth.pitt.edu/syn_pops - I downloaded data for all of California and then picked a "people.txt" file at random and opening it up it had age and household ID which I think is all we need!
data-raw
data-raw/DATASET.R
R/data.R
generate_initial_schools_bootstrap
to use new data
generate_initial_workplaces_bootstrap
function (similar to the schools one)
generate_initial_households_bootstrap
to use new dataBased on chat with @cwhittaker1000:
generate_initial_schools
with arguments for type
as either "empirical"
or "synthetic"
and country
as either "UK"
or "US"
Household and age distributions now in via https://github.com/mrc-ide/helios/pull/97 - comments welcome!
Note that I've started the USA workplaces update. A couple of things:
I think given all this, we're fine to just use the parameters and sampling scheme from Ferguson et al. I'm going to update the structure to have "synthetic" vs "empirical" (where "synthetic" will be empty for now).
Partially addressed by https://github.com/mrc-ide/helios/pull/98 and https://github.com/mrc-ide/helios/pull/97
Fully closed by https://github.com/mrc-ide/helios/pull/99
Currently, our data sources are quite UK focussed. Try and source data (for e.g. distribution of school sizes) for the USA so that all datasources being used to parameterise the model (esp. wrt locations) are derived from the US.