Closed cwhittaker1000 closed 4 months ago
Get started with trying this out by running
# Loading library
library(helios)
# Checking country is present as new argument and can be overridden
model_params <- get_parameters(overrides = list(household_distribution_country = "UK"))
model_params$household_distribution_country
model_params <- get_parameters()
model_params$household_distribution_country
# Checking create variables works with USA and UK
uk_variables <- create_variables(get_parameters(overrides = list(household_distribution_country = "UK")))
uk_household_id <- uk_variables$variables_list$household$get_categories()
uk_household_sizes <- vector()
uk_households <- 1:max(as.numeric(uk_variables$variables_list$household$get_categories()))
for(i in uk_households) {
uk_household_sizes[i] <- uk_variables$variables_list$household$get_size_of(as.character(i))
}
hist(uk_household_sizes)
usa_variables <- create_variables(get_parameters(overrides = list(household_distribution_country = "USA")))
usa_household_id <- usa_variables$variables_list$household$get_categories()
usa_household_sizes <- vector()
usa_households <- 1:max(as.numeric(usa_variables$variables_list$household$get_categories()))
for(i in usa_households) {
usa_household_sizes[i] <- usa_variables$variables_list$household$get_size_of(as.character(i))
}
hist(usa_household_sizes)
## Checking run_simulation works with it
usa_results <- run_simulation(get_parameters(overrides = list(household_distribution_country = "UK"), archetype = "sars_cov_2"))
uk_results <- run_simulation(get_parameters(overrides = list(household_distribution_country = "USA"), archetype = "sars_cov_2"))
plot(usa_results$timestep, usa_results$E_new, type = "l")
lines(uk_results$timestep, uk_results$E_new, col = "red")
Note the SF household size distribution looks quite different to the UK. I've checked and the mean HH size and general shape matches that described here:
It might be we want to pick somewhere else. Let me know thoughts.
Note - have updated so that the parameter country
becomes household_distribution_country
and we specify which country
each of the locations' distributions are drawn from. Felt more exact and clearer that way.
Thanks @cwhittaker1000! Looks good to me. I verified that running the code works as expected.
Two things I picked up in my review:
We could do some things to standardise across the UK and the US data such as naming the UK data with _uk
and tranforming the UK data in the data-raw
folder. I think these are quite minor. If you agree that they're an improvement happy for them to be a new issue.
Need to call devtools::document()
to render the documentation for these things below before merging:
Writing schools_england.Rd
Writing baseline_household_demographics.Rd
Writing baseline_household_demographics_usa.Rd
By the way, I'd recommend using barplot(table(uk_household_sizes))
rather than hist(uk_household_sizes)
for integers. Here is UK:
And here is US:
For me it is confusing that in the UK nothing exists larger than a household of 6. The US data doesn't look too outlandish but I don't have a lot of domain expertise. The mean is pretty similar to the UK:
> mean(uk_household_sizes)
[1] 2.373606
> mean(usa_household_sizes)
[1] 2.220742
I guess we can expect more variance in epidemics with some large households. (By the way, do we have a way to track the location where people were infected? It could be interesting, espeically with turning on far UVC in some locations, to see how the distribution of location of infection changes.)
Thanks for all of the above @athowes - have standardised the data naming and pulled all the transformation into the DATASET.R file. Have also run devtools::document()
now.
Get started with this PR by:
# Loading library
library(helios)
# Checking country is present as new argument and can be overridden
model_params <- get_parameters(overrides = list(household_distribution_country = "UK"))
model_params$household_distribution_country
model_params <- get_parameters()
model_params$household_distribution_country
# Checking create variables works with USA and UK
uk_variables <- create_variables(get_parameters(overrides = list(household_distribution_country = "UK")))
uk_household_id <- uk_variables$variables_list$household$get_categories()
uk_household_sizes <- vector()
uk_households <- 1:max(as.numeric(uk_variables$variables_list$household$get_categories()))
for(i in uk_households) {
uk_household_sizes[i] <- uk_variables$variables_list$household$get_size_of(as.character(i))
}
barplot(table(uk_household_sizes))
usa_variables <- create_variables(get_parameters(overrides = list(household_distribution_country = "USA")))
usa_household_id <- usa_variables$variables_list$household$get_categories()
usa_household_sizes <- vector()
usa_households <- 1:max(as.numeric(usa_variables$variables_list$household$get_categories()))
for(i in usa_households) {
usa_household_sizes[i] <- usa_variables$variables_list$household$get_size_of(as.character(i))
}
barplot(table(usa_household_sizes))
## Checking run_simulation works with it
usa_results <- run_simulation(get_parameters(overrides = list(household_distribution_country = "UK"), archetype = "sars_cov_2"))
uk_results <- run_simulation(get_parameters(overrides = list(household_distribution_country = "USA"), archetype = "sars_cov_2"))
plot(usa_results$timestep, usa_results$E_new, type = "l")
lines(uk_results$timestep, uk_results$E_new, col = "red")
though with this working and addressing @athowes comments, I'll merge this shortly.
Partially addresses https://github.com/mrc-ide/helios/issues/83 - using household and age data from https://fred.publichealth.pitt.edu/syn_pops (which is in turn a synthetic population including household and age data that was created by RTI).
You have to pick a specific county so I've picked San Francisco somewhat arbitrarily to begin with. File sizes are huge so it'll be tough to do anything more general than that at this stage.
@athowes would love your thoughts on the PR and the above when you get a mo :)