Open ManuelaRunge opened 3 years ago
The trace_selection.py followed by simulate_traces.py generates a 'fitted_parameters_besttrace.csv
' that includes the top best parameter set for parameters that were varied in a simulation (or fitted_parameters_ntraces27.csv
for best x fitting combinations ) as well as sample_parameters_besttrace.csv
(or sample_parameters_ntraces27.csv
) for combining sample parameters with fitted parameters that can be used as input csv to run a new simulation via runScenarios.py [...] --sample_csv sample_parameters_besttrace.csv
.
However there are some issues with using this approach on a routinely basis:
time_to_critical
becomes time_to_critical_EMS-2
) - which makes using it as input inappropriate as the emodl file only defines time_to_critical
and not time_to_critical_EMS-2
.
samples_parameter.csv
and 'additional varying parameters.csv
- for which this script (sample_parameters.py) has been designed, however it is not user-friendly or integrated into the workflow in the current state. Re: 1&2
Want to check if I'm interpreting this correctly. From what I understand, when say I fit a baseline locale model of EMS_1, EMS_5 and EMS_11 for 20 samples, the program randomly draws 20 uniform random numbers based on YAML specification for say fraction_dead_change8
. In a single trajectory (scen_num
in the code), this fraction_dead_change8
is the same for all three regions. After running the program, we run trace_selection
and simulate_traces
to select best set of parameters (corresponding to best trajectories). However, because the trace selection process is per region basis, scen_num
of 1 might be the best for EMS_1 with fraction_dead_change8
of 0.15 but scen_num
of 8 with fraction_dead_change8
of 0.14 might be the best for EMS_5. simulate_traces
eventually create the "best" parameters with fraction_dead_change8_EMS-1 = 0.15
and fraction_dead_change8_EMS-5 = 0.14
.
As a result, we have three best fraction_dead_change8
after trace selection, one for each region. The original model does not allow regional variation in fraction_dead_change8
, so the result of trace selection is incompatible with the original model.
Question: Is it a good idea to allow fraction_dead_change8
to vary among the regions? If yes, then original model might need to be rewritten to allow regional variations. If not, then the trace selection process might be incorrect (likelihood should be calculated using all three regions instead of per-region basis).
Re: 3
Since this is for fitting purpose, should the additional_varying_parameters
be sampled from a distribution? Currently the script sample_parameters.py is assuming the additional column is fixed value. Perhaps I'll work on allowing this additional column(s) to be randomly generated.
Re 1&2
Yes exactly, that is how it works and why it is in the end 'incompatible' , the simulate_traces.py
could be modified to not attach the region suffix EMS-x
, i.e. when grp_list ==1
(L171) (running simulation for single region), or could be modified to write out 11 different csv's, maybe using a single region is a good start before scaling up the all regions in one model, since at the end, ideally, both ways should be possible.
Re question: Good point. fraction_dead_change8
as all the other sample parameters in the yaml files is not per se treated as a fitting parameter, and when fitting the transmission rate (ki) multiplier, we fixed all sample parameters to their mean (using --paramdistribution uniform_mean
(here)) and only included the whole range in the final simulations. In that final simulation the uncertainty ranges and medians for all regions would correspond to the same sampled parameters.
Therefore we use the trace_selection.py
not only to fit a specific parameter, but also for thinning the trajectories to select the best n unique parameter sets per region.
All parameter values would still be within a reasonable range as pre-defined in the yaml file, it would be useful to check by how much these vary per region, and whether some of them should be excluded from thetrace_selection.py
, which would complicate the combination of parameters to simulate.
This script (extract_sample_param.py) generates histograms from the sample_parameters.csv and could be modified to read in the filtered sample_parameters.py
Re 3:
yes when using --param_dic
the sample_parameters.py
requires a single fixed value, this was intended for changing maybe rollback_multiplier 0.5 to 0.6 when running simulations for specific intervention scenarios.
The sample_parameters.py
also allows for the additional column to have multiple values, or even multiple columns, which is addressed via the --csv_name_combo
and gen_combos(csv_base, csv_add)
(here)
Is that what you meant?
I think it is going into the direction what we are intending to do in the IEMS project, for the different mitigation scenarios, where currently the ki_mitigation parameter distribution needs to be generated in a separate python script to generate the csv that can be read in into sample_parameters.py
.
I like the idea of allowing sample_parameters.py
to automatically generate the parameter distribution, which would require many more arguments? - and would the additional parameter distribution be attached to the unique set of sample parameters or repeated for each set of sample parameters (both could be desired depending on purpose of the simulation?)
Great I missed out the --csv_name_combo
options.
I was thinking of using YAML as input to specify distribution of additional parameters. I think there's existing framework to translate YAML to sample parameters, so it could be a straightforward modifications.
I agree that completely random or repeated random both are desirable, should be able to implement both with ease...
Made small changes in PR #748
As mentioned in the PR note, the current examples at README don't seem to work. Created some simplistic examples so it would work for the YAML ones. I will look into the problem with other examples a little more.
the other examples are likely outdated, since the runScenarios.py arguments had been updated, and the refered emodl file does not necessarily exist (we previously had hardcoded emodl files for each scenario).
python sample_parameters.py -rl Local -r IL --model locale --experiment_config spatial_EMS_experiment.yaml --emodl_template extendedmodel_EMS.emodl -save sampled_parameters2.csv
could be simplified to
python sample_parameters.py --experiment_config spatial_EMS_experiment.yaml -save sampled_parameters2.csv
however I am getting a yamlordereddictloader package errror on my side, will fix that
As a note for example 5:
python sample_parameters.py -e "..\experiment_configs\example\example.emodl" -load "..\example\csv_base.csv" -yaml ".\experiment_configs\example\samp_params_combos_example.yaml"
adding the path into the arguments should not be required as also not required in runScenarios.py and should be checked why it is required.
the other examples are likely outdated, since the runScenarios.py arguments had been updated, and the refered emodl file does not necessarily exist (we previously had hardcoded emodl files for each scenario).
python sample_parameters.py -rl Local -r IL --model locale --experiment_config spatial_EMS_experiment.yaml --emodl_template extendedmodel_EMS.emodl -save sampled_parameters2.csv
could be simplified to
python sample_parameters.py --experiment_config spatial_EMS_experiment.yaml -save sampled_parameters2.csv
however I am getting a yamlordereddictloader package errror on my side, will fix that
As a note for example 5:
python sample_parameters.py -e "..\experiment_configs\example\example.emodl" -load "..\example\csv_base.csv" -yaml ".\experiment_configs\example\samp_params_combos_example.yaml"
adding the path into the arguments should not be required as also not required in runScenarios.py and should be checked why it is required.
Yeah I remember having to add import yamlordereddictloader somewhere to make it work.
I put all the simplified emodl, csv and yaml into an example folder (to reduce clutter in important folder), that's why path is needed. Depends on how you feel about it, we can remove the example folder approach. For now the additional parameters yaml does not have a specific folder to live in so path is required.
as I see! makes sense.
for that reason the emodl
and experiment_config/input_csv
folders are ignored with exceptions for the main files, to allow collecting custom input files that are not required for anyone else.
The experiment_config
with yamls is not yet ignored, but I would prefer adjusting the gitignore over introducing subfolders, since i.e. yaml and emodl files are copied over or into in some other scripts.
as I see! makes sense. for that reason the
emodl
andexperiment_config/input_csv
folders are ignored with exceptions for the main files, to allow collecting custom input files that are not required for anyone else. Theexperiment_config
with yamls is not yet ignored, but I would prefer adjusting the gitignore over introducing subfolders, since i.e. yaml and emodl files are copied over or into in some other scripts.
Yes, OK, will have a look and see how to consolidate all the examples.
to follow up on this,
could
python sample_parameters.py -e "..\experiment_configs\snippets\example.emodl" -load "..\snippets\csv_base.csv" -yaml ".\experiment_configs\snippets\samp_params_combos_example.yaml"
become
python sample_parameters.py -e "example.emodl" -load "csv_base.csv" -yaml "samp_params_combos_example.yaml"
with moving the files to the respective default folders (emodl, experiment_config, and input_csv) ?
and to introduce new python model will requrie modifying the virtual environment used on quest, as well as requirements for local set up. Would it be sufficient to raise a simple ValueError instead of a warning here
warnings.warn(parameter + ': List length different from replicate_number and factorial_after is not True.')
warnings.warn("Parameter " + parameter + " skipped: don't know how to sample this parameter.")
The yamlordereddictloader
module has often caused installation issues among users (see related issue#716), therefore I would suggest to use try except or just load yaml without specifying a Loader (in future we might want to switch to a new yaml loader, but would also require modyifing quest python environment, so no priority there)
And once this is working (seems it does already, I got the sampled_parameter.csv) - it would be useful to have this apply for the covid chicago setup, i.e. save fitted_sample_parameters_region_X.csv per region from the last fitting iteration and setup a additional yaml (or possible to reuse the spatial yaml?) to make modifications to the csv for running new simulations, and have batch files to facilitate automation is possible. I can also have a go at it! (i.e. add script for extracting region specific sample parameter csvs, and then only left is the batch files with the appropriate file specification (?)). Does that make sense? (not sure if we still want to fully integrate this into the current fitting-simulation workflow, but that would be the remaining steps)
I think this would do
(example for region 11)
for e, grp in enumerate(grp_list):
grp_nr = grp_numbers[e]
df_samples = pd.read_csv(os.path.join(output_path, 'sampled_parameters.csv'))
rank_export_df = pd.read_csv(os.path.join(output_path, f'traces_ranked_region_{str(grp_nr)}.csv'))
rank_export_df_sub = rank_export_df[0:n_traces_to_keep]
df_samples = df_samples[df_samples['sample_num'].isin(rank_export_df_sub.sample_num.unique())]
#FIXME list of regions to drop hardcoded, also EMS_1 vs EMS_11
cols_to_drop =[]
for ems in ['EMS_2' ,'EMS_3' ,'EMS_4' ,'EMS_5' ,'EMS_6' ,'EMS_7' ,'EMS_8','EMS_9','EMS_10' ]:
cols_to_drop = cols_to_drop + [i for i in df_samples.columns if ems in (i)]
df_samples = df_samples.drop(cols_to_drop, axis=1)
df_samples['scen_num_orig'] = df_samples['scen_num']
df_samples['scen_num'] = range(0,len(df_samples['scen_num']))
df_samples.to_csv(os.path.join(output_path, f'sample_parameters_region_{str(grp_nr)}_{n_traces_to_keep}.csv'), index=False)
adapted from functions in simulate_traces.py, however the setup/purpose is slightly different, so would add it via new python file. Testing now whether it results in the same output as the 'original' one
Follow up sim can run via
python runScenarios.py -sr EMS_11 -csv sample_parameters_region_11_100.csv -n "n100bestfitsamples" --scenario bvariant_vaccine
Adding a comment: when running from input csv (that is already fitted) the trace_selection step needs to be removed from the postprocessing.sh files)
Incorporated your suggestions and merged my PR. Still have to deal with yamlordereddictloader
issues, and the integration to covid chicago setup, more tinkering is on the way! 😃
needed for running from timestep -and to reduce sample size
also assess consistency of fitted sample parameters per region over time