Closed tbreweric closed 5 months ago
Thanks going through this PR and for the comments @athowes - I'll try and get around to them on the weekend!
I tried running this and got:
> parameter_table <- readr::read_csv("data/parameter_table_example.csv")
Rows: 4 Columns: 9
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): far_uvc_workplace_coverage_type
dbl (6): human_population, number_initially_exposed, simulation_time, far_uvc_workplace_coverage, far_uvc_workplace_efficacy, far_uvc_workplace_timestep
lgl (2): render_diagnostics, far_uvc_workplace
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> example <- helios::run_simulations_from_table(parameters_table = parameter_table)
Error in sample.int(length(x), size, replace, prob) :
invalid 'size' argument
I'm unsure that this is something which is specific to this PR. I'm using the version of the package as of this PR. Can you reproduce this?
Here is the traceback, so it looks like it's something happening within generate_initial_disease_states
for at least one of the parameters_list
entries:
Error in sample.int(length(x), size, replace, prob) :
invalid 'size' argument
6.
sample.int(length(x), size, replace, prob)
5.
sample(x = 1:length(initial_disease_states), size = parameters_list$number_initially_exposed,
replace = FALSE, prob = NULL) at variables.R#139
4.
generate_initial_disease_states(parameters_list = parameters_list) at variables.R#11
3.
create_variables(parameters_list) at model.R#11
2.
run_simulation(parameters_list = parameters_lists[[i]]) at model.R#70
1.
helios::run_simulations_from_table(parameters_table = parameter_table)
Using read.csv
rather than read_csv
I get:
> example <- helios::run_simulations_from_table(parameters_table = parameter_table)
There are less than or equal to 2 schools. Consider the population size may be too small!
Error in bind_cols(simulation_outputs[[i]], as_tibble(parameters_table[i, :
could not find function "bind_cols"
I think this is because I need to have called library(dplyr)
. When I do that it works.
> example <- helios::run_simulations_from_table(parameters_table = parameter_table)
There are less than or equal to 2 schools. Consider the population size may be too small!
[1] "1th simulation complete (25% of simulations complete)"
[1] "2th simulation complete (50% of simulations complete)"
[1] "3th simulation complete (75% of simulations complete)"
[1] "4th simulation complete (100% of simulations complete)"
So my suggestion is to (I think) add dplyr
to the Imports
in the DESCRIPTION
file. We may want to be putting dplyr::
also in the code. I don't know the best practise here exactly / have done this in the past by precident without knowing the full theory. Have a look at: https://r-pkgs.org/description.html#sec-description-imports-suggests
Also, suggest changing from 1th, 2th, etc. Can just use "Simulation 1", "Simulation 2", etc. to make it easy (not have to deal with st, nd, rd, th).
Oh also, with the error when using a tibble
I don't think we should allow that to happen. So options are:
Using
read.csv
rather thanread_csv
I get:> example <- helios::run_simulations_from_table(parameters_table = parameter_table) There are less than or equal to 2 schools. Consider the population size may be too small! Error in bind_cols(simulation_outputs[[i]], as_tibble(parameters_table[i, : could not find function "bind_cols"
I think this is because I need to have called
library(dplyr)
. When I do that it works.> example <- helios::run_simulations_from_table(parameters_table = parameter_table) There are less than or equal to 2 schools. Consider the population size may be too small! [1] "1th simulation complete (25% of simulations complete)" [1] "2th simulation complete (50% of simulations complete)" [1] "3th simulation complete (75% of simulations complete)" [1] "4th simulation complete (100% of simulations complete)"
So my suggestion is to (I think) add
dplyr
to theImports
in theDESCRIPTION
file. We may want to be puttingdplyr::
also in the code. I don't know the best practise here exactly / have done this in the past by precident without knowing the full theory. Have a look at: https://r-pkgs.org/description.html#sec-description-imports-suggestsAlso, suggest changing from 1th, 2th, etc. Can just use "Simulation 1", "Simulation 2", etc. to make it easy (not have to deal with st, nd, rd, th).
I think I would avoid adding dependencies where we can as it can create issues with updates to packages (happened to me on another project this week)? Agree that we should add package::
wherever necessary!
In response to your suggestions below - all good ones! I think the conversion to data.frame with a warning that you've had to do this might be the way to go - means we try and correct the error but also flag that if there are issues the user may want to do this themselves before feeding the parameter table to the function?
I think I would avoid adding dependencies where we can as it can create issues with updates to packages (happened to me on another project this week)? Agree that we should add package:: wherever necessary!
Agree that there would be downsides to adding to Depends
. Mainly I'd just like to avoid a situation where someone calls a function and it gives an error about not finding a certain function, and they don't know what function that package might be from. If there is a way to do this without adding to Depends
(is it Imports
or Suggests
?) then we can do that.
In response to your suggestions below - all good ones! I think the conversion to data.frame with a warning that you've had to do this might be the way to go - means we try and correct the error but also flag that if there are issues the user may want to do this themselves before feeding the parameter table to the function?
Yes I think convert to data.frame
internally. To be honest I'm confused about the difference between tibble
and data.frame
and why one would work and the other not. I think almost not worth warning about this, aside from that if you output the object (parameter_table
) that someone would expect it to be in the same format they input it. You could just code this, but I think it's a bit clunky (i.e. returning it as a tibble
if they input it as a tibble
).
I think I would avoid adding dependencies where we can as it can create issues with updates to packages (happened to me on another project this week)? Agree that we should add package:: wherever necessary!
Agree that there would be downsides to adding to
Depends
. Mainly I'd just like to avoid a situation where someone calls a function and it gives an error about not finding a certain function, and they don't know what function that package might be from. If there is a way to do this without adding toDepends
(is itImports
orSuggests
?) then we can do that.In response to your suggestions below - all good ones! I think the conversion to data.frame with a warning that you've had to do this might be the way to go - means we try and correct the error but also flag that if there are issues the user may want to do this themselves before feeding the parameter table to the function?
Yes I think convert to
data.frame
internally. To be honest I'm confused about the difference betweentibble
anddata.frame
and why one would work and the other not. I think almost not worth warning about this, aside from that if you output the object (parameter_table
) that someone would expect it to be in the same format they input it. You could just code this, but I think it's a bit clunky (i.e. returning it as atibble
if they input it as atibble
).
Have added dplyr to imports and the dplyr::
to the bind_cols()
call
In this PR I've created a function called
run_simulations_from_table()
(model.R
) which allows us to collate a series of simulations into a dataframe, where each column is a parameter we want to vary from theget_parameters()
default and each row contains the values for a single simulation, and applyrun_simulation()
to each one.The current implementation is a little crude - it creates a default
parameters_list
then manually overwrites the parameters of interest using name matching. It stores both the parameters lists and the simulation outputs for each simulation (row of the dataframe) and the user can decide what they want to output (parameters, simulations, or both). The function also appends columns to each dataframe for the parameters that have been varied.A number of improvements/additions could be made:
get_parameters()
andset_uvc()
- currently we just overwrite the entries for them after a base version has been generated.Keen for both of you @athowes and @cwhittaker1000 to have a look / play with this to see where we could improve it / where it breaks and needs fixing!