How to derive MonteVar's from other MonteVar's in the same RUN?

dbankieris commented 5 years ago

@garyt2 would like to distribute a MonteVarRandom and derive an additional MonteVar from the first variable's value. In pseudo code:

foo = random()
bar = foo + 1

This does not appear to be possible. Here are some approaches I tried:

Specify only a MonteVarRandom, and use a monte_slave_pre job to directly set a derived variable in the slave. This doesn't work when that run is executed standalone because Monte Carlo jobs aren't run. Even if they were, since the derived variable is not a proper MonteVar, there would be no record of it in monte_input.
Specify only a MonteVarRandom, and use an initialization job to directly set a derived variable in the slave. This works in both Monte Carlo and standalone mode. However, since the derived variable is not a proper MonteVar, there is no record of it in monte_input. Furthermore, because this is done in an initialization job, this approach forces a correlation in all modes: Monte Carlo, standalone "rerun", and regular non-MC.
Specify both a MonteVarRandom and a MonteVarCalculated, and use a monte_master_pre job to set the derived variable in the master before sending both to the slave. This doesn't work because the MonteVarRandom doesn't actually set the corresponding variable in the master's memory. But that wouldn't be enough anyway. All MonteVar values are assigned in a single for loop, so there is no opportunity to change the variable to which the MonteVarCalculated refers before its value is fetched but after the MonteVarRandom variable is fetched. This approach would actually require that all non-MonteVarCalculated variables be fetched first, then the monte_master_pre jobs run, and finally the MonteVarCalculated variables fetched. But wait! This won't work if a MonteVarCalculated depends on another MonteVarCalculated.
Make both variables MonteVarCalculated, and use a monte_master_pre job to manually randomize the first variable and then calculate the derived variable. This works, and both variables are present in monte_input, but you lose all the fanciness of MonteVarRandom.
Gary's solution is in #706. I would go further and have the class execute arbitrary Python code directly rather than force it to reside in a file.

We had a lively discussion about this in the Trick lab and are mulling it over before we make any design choices. @ddj116 perhaps a redesign is more likely than I thought!

In the meantime, (2) seems like the least-objectionable workaround.

@alexlin0 @jmpenn

ddj116 commented 5 years ago

@ddj116 perhaps a redesign is more likely than I thought!

Image of Yaktocat

I'm biased but I'd prefer (5), even if it's a temporary workaround until a redesign is implemented.

garyt2 commented 5 years ago

Note - with option 2 it is possible to block the execution of the init-class job during non-MC runs by using a MonteVarFixed instance as a control flag. Then the init-class job looks like:

if (my_mc_dispersions) {
  <apply dependent dispersions>
}

where my_mc_dispersions constructs to 0 and is reset to 1 by the MonteVarFixed.

The main problem I ran into with this method was how to handle autocoded models which have no mechanism for adding a MC-dispersion method and control flag. In these cases, the init-class method definition and its accompanying flag had to reside elsewhere. I started collecting these into a single common location, but the number of control flags started exceeding my comfort level (driven by considering whether end-users are going to be able to figure out how it works).

Add to that the separation of the MonteVarFixed setting and the code that needed it, and the whole configuration started to look hacky and fragile. But it does work.

garyt2 commented 5 years ago

On option 5, I agree that providing code directly rather than a filename is an improvement, with the caveat that execfile("file.py") should be a legitimate instruction. For complex processing, it would be better to hide the logic in another file, but for simple applications (like 1-liners) it would be better to drop the instruction-set in directly.

Any updates on when there might be a decision on these MC enhancements?

dbankieris commented 5 years ago

Thinking through this a little more, it doesn't fit our paradigm very well. The existing MonteVar-derivatives assign to their value field in get_next_value and then return a string making an assignment using name, value, and units (this common code should probably be in the superclass!). MONTE_RUN_*/monte_runs lists the values each variable was assigned for each run by recording MonteVar::value, and MONTE_RUN_*/monte_header recreates each variable as a MonteVarFile that references the values in monte_runs. MonteVarExecCode doesn't use name, units, or a value in a way that makes sense in our current framework. What do we put in its monte_runs column? What do we put in monte_header for it?

I think the MonteVar framework will need to be redesigned if we want to accommodate an arbitrary code execution MonteVar while still producing a sensible monte_header and monte_runs. However, we should first answer the question of whether or not a MonteVar should continue to represent a single assignment to an actual sim variable, as the existing classes do. If so, arbitrary code execution isn't appropriate, and we might prefer to add a way to explicitly express that one MonteVar's value is derived from another's, which is the initial problem we were trying to solve.

garyt2 commented 5 years ago

I had not considered the impact on monte_header and monte_runs (never used them, if I want to re-run, I'll just use the monte_input file directly or re-run the batch). I see only 2 ways of getting this type of logic injected into the monte-header/monte-runs framework:

add the MonteVarExecCode injection points directly to monte_header as they are encountered (rather than converting to MonteVarFile as with other MonteVar* instances)
execute the logic as the variables are evaluated and add these values to monte_runs.

While the first solution would (theoretically) support re-runs from monte_header, it still doesn't provide the values of the variables so still doesn't really fit the paradigm. The second would run into problems with identifying which variables actually need recording; the necessary syntax modifications would probably make the process of adding this logic as Python files too complicated to be useful.

If we are looking at a refactor of how MonteVar* works, do you have any idea of when something might be available to support these requirements?

garyt2 commented 5 years ago

Additional note - the current paradigm doesn't really fit the current paradigm either, it is self-defeating. Users are dispersing and modifying variables without them being recorded in monte_runs or monte_header, making re-runs impossible. That is the real driver behind this work. We have to get re-runs to reproduce the data produced by first-runs.

nasa / trick

How to derive MonteVar's from other MonteVar's in the same RUN? #709