Open steipatr opened 3 years ago
I like these various ideas. A few quick reactions
utilities.py
, lines 231-232 for further details. Part of the problem here is that it is not easy to extract a lot of this metadata from the results returned by perform_experiments. E.g. the sampler that is used, or the number of experiments are not explicitly logged within either the experiments or outcomes dict. group_results
in plotting_util.py
uncertainties = parameters_from_json(filename)
parameters_to_json(uncertainties, filename)
and equivalent functions for other parameter file formats we might want to support.
OK, I will take a stab at 1 and 2 first. Would you prefer separate issues to discuss in more detail (and close this one)?
probably better of as separate issues.
might also be good to move these five ideas into the TODO file (which I can do)
Hi Jan,
Jason and I have now completed our project for the Energy Modelling Initiative. We used the Workbench extensively, but also added some stuff to it. Looking at our code, we've identified a few things that might be interesting to incorporate into the Workbench. If you could take a look and let us know which fit your vision and ideas, then we will submit PRs (and separate issues?) for those.
1. Saving
We developed a standardized file naming scheme "date_samplingmethod_numberofruns_numberofoutcomes" (e.g.
2021-02-21T14-29_lhs_600000x34.tar.gz
) that we used together withsave_results
. We found that useful to have standardized names for the different data sets we had generated. This could be added to EMA as a functionmake_tar_name
inutilities.py
and used something likeor maybe even as the default in
save_results
if nofile_name
string is given. Potential criticism: are file names the right place to store metadata?2. Experiment stats
We added some code to
perform_experiments
that printed various stats once the experiments had completed, e.g.This was useful because it allowed us to compare different cluster configurations and get an idea of how long future experiments might take. This could be added to
perform_experiments
. We just had it as a print statement, but it could probably integrate withema_logging
.3. Outcomes dict to dataframe
For analysis, we found it convenient to convert the outcomes dict to a Pandas dataframe. There's a question of dimensionality here because EMA outcomes can have different dimensions based on the type of model and outcomes of interest, but it might be useful to explore this. It could be added to EMA as a function
outcomes_to_df
inutilities.py
. @jasonrwang can probably say more about this if necessary.4. Splitting results
Since some parameter combinations caused integration errors with our model, we had to parse the results after the fact and identify/remove runs with integration errors. For this, we wrote a small utility called
split_results
that would take one results object and split it into two objects based on e.g. a dict with keys = ("A", "B") and values containing lists of run numbers, or a list containing a set of run numbers. This is basically the reverse operation of EMA'smerge_results
, and could also be added toutilities.py
.5. External parameters file
This is more a conceptual thing, but we found it super useful to define our parameters in an external .py file and then import them into the notebook or script for experiments or analysis. Saved us a lot of copy-pasting and made version management of parameter ranges way easier. We're not sure if this is really something that could be "added" to EMA since it's already doable, just thought we would share it here in case it sparks ideas somehow.
Happy to hear whether any of this would be useful within EMA, in the presented or a modified form. We can also share our codebase with you if that's useful. Let us know, happy to contribute where possible.