quaquel / EMAworkbench

workbench for performing exploratory modeling and analysis
BSD 3-Clause "New" or "Revised" License
126 stars 91 forks source link

Zipping over scenarios and policies #47

Open jpn-- opened 5 years ago

jpn-- commented 5 years ago

For generating a design of experiments, the workbench currently iterates over scenarios and policies, creating a set of runs from the itertool.product of these two collections. So if there are 5 policies and 10 scenarios, there are 50 experiments. This is a reasonable approach for most EMA applications.

However, for running experiments to generate a meta-model, it is more efficient to "zip" these: make 50 draws for both policies and scenarios, pair them up, and run 50 experiments.

From the perspective of meta-model development, there is no difference at all between uncertainties and levers -- we merely seek to build a replacement black box for each model structure that converts inputs (both scenarios and policies) to outputs. To build the meta-model, it is often more efficient to have maximum variation in each of the input parameters. By providing a zip-over process, we can increase the variability in the inputs for the same number of experiments.

quaquel commented 5 years ago

I agree that something like this is useful. I have had a few cases where we were doing something similar ourselves.

However, there might be some ambiguity here. I see three (or two) possible cases

  1. Sample over lever space and uncertainty space independently, next combine them in a full factorial manner (the current behaviour)
  2. Sample over lever space and uncertainty space independently, next combine them in a randomly (i.e. pick for each scenario a random policy, or vice versa depending on which is sampled more extensively)
  3. zip them up like use suggests. But this only works if n_sceanrios == n_policies. One might argue that this is a corner case following from the option 2.

The reason for maintaining independent sampling of scenarios and policies also for case 2 and 3 is that it gives you more flexibility. For example, you can have predefined policies that you want to test over some scenarios. If you maintain independent sampling this is easy to do. Also, the zip_over idea is arguably a corner case of 2.

Let me know what you think.

jpn-- commented 5 years ago

I had not contemplated your option 2, but now that you point it out it's clearly a reasonable use case. But, I think it's a minor tweak to include it. As I proposed it previously, zip_over would throw an exception if n_scenarios != n_policies. However, if we assume that the input collections of scenarios and policies are shuffled before zipping (and/or explicitly shuffle them), and we itertools.cycle the shorter one, we can achieve this result without any other changes. This ensures a balanced distribution of draws from the shorter collection across the larger collection, which seems like almost certainly what the user would want.

If this seems reasonable (please amend if I have overlooked something), I can do this and submit a separate self-contained zip_over PR.

quaquel commented 5 years ago

This sounds exactly like what I had in mind. If you have time, please go ahead and make the change and submit a PR

quaquel commented 5 years ago

This is still very much on my to do list, but I have some further thoughts based on some recent work I have been doing. The underlying issue is how to combine points sampled from uncertainty space, and points sampled from lever space. The default is to have a full factorial. Alternatively, you can do the zip_over trick as suggested above. However, sometimes you don't want to sample the two spaces separately but rather jointly (e.g. in order to do Sobol over the combined space).