cyclic foreign vars - Githubissues

feefladder commented 3 years ago

[x] Closes #164 #120
[ ] Tests added
[x] Passes black . && flake8
[ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

feefladder commented 3 years ago

proof-of-concept PR for allowing user-defined custom dependencies, still needs:

tests,
documentation
solution for on_demand variables
(optionally) a way to quickly add chains of succesive processes. (this could be a wrapper that builds a dependency dict, an open question is how to find corresponding variable names)

the arrows from inout still also show up if the process that uses the inout var is set to explicitly depend on it

feefladder commented 3 years ago

So, we probably still have to move the check to a place where we can actually view the model.

For the checking, I used a dictionary of {p_name:{dep1,de2,dep3}}, that is also very useful for any other graph inspection. Do you think it is a small enough datastructure to be kept? (this allows for very concise/faster code in other functions.

benbovy commented 3 years ago

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

jvail commented 3 years ago

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

I feel guilty having started this particular feature discussion :) So let me give a practical example from the model I am working on (it's a bit messy right now - but work-in-progress):

model-graph

The process topology (and others) depend on the result of the arch_dev process (t minus one). And imho it would be a good improvement if that could be visualized somehow.

feefladder commented 3 years ago

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

Ok. I get that, I can move the reduction to another PR, as well as maybe adding custom dependencies. Then we have:

custom dependencies
dependent on the deps_dict:
- strict checking
- graph reduction
maybe: dotted arrows

This will quite test my Git skills....

@jvail no worries, the main part of this PR is about enabling (and verifying) inout variables, that was already an issue. The dotted arrows part will need some more work and is actually quite complex, not sure if I'll find time for that

benbovy commented 3 years ago

Thanks for the illustration @jvail.

I was getting confused because we (I feel guilty too) involve two separate problems in the discussion here and in #164:

we have a quantity that we want to use and possibly update during the same timestep (or simulation stage) in one or more processes executed with some specific constraints on the workflow.
we have a quantity a which depends on quantity b, but b depends on a computed at the previous timestep or simulation stage.

Problem 1 is what I thought @Joeperdefloep you describe in your https://github.com/xarray-contrib/xarray-simlab/issues/164#issuecomment-802873790 (as you mentioned Fastscape's SurfaceToErode which is a workaround to that specific problem). The introduction of user-defined process dependencies (+ allow intent='inout' for foreign variables) would solve this problem.

Problem 2 is what @jvail you initially describe in #164. The current workaround is to declare an inout variable for a, define an input value as the initial value at t = -1, and make sure that it is not updated during the same step/stage than when it is accessed to compute b. However, this approach is a bit fragile. With the introduction of user-defined process dependencies, you will instead need to provide an explicit dependency when building your model so that you can safely compute b before updating a during the same step/stage.

That said, for problem 2 you'll still need an inout variable (+ input value) for a. If you want to define a with intent='out', then the cleanest way to do this is to introduce a whole new concept of a 3rd dimension (or some kind of step/stage unroll) for the graph of processes in xarray-simlab (cf. your https://github.com/xarray-contrib/xarray-simlab/issues/164#issuecomment-789414007). I don't think it's worth the extra complexity, though (the intent concept is also invariant of the simulation stage). Alternatively, we could add an exception to the variable intent heuristics (cf. declare a variable as seemingly_cyclic) but I don't really see much advantage of this approach compared to the inout + input value workaround.

feefladder commented 3 years ago

@jvail the difficulty: imagine the graph:

  inout1->inout2->inout3
   /   \ /         \
 in1   in2          in3

then, we want arrows only from inout3 to in1, so we need some sort of sorting algorithm for that. For all inout variables. That is quite the algorithm.

feefladder commented 3 years ago

@benbovy I think I will close this PR until I have properly split it

benbovy commented 3 years ago

Yeah sorry @Joeperdefloep a fresh start in new branches is probably the best thing to do.

What we need first is a PR that

extends the Model API so that we can provide explicit process dependencies
implements the logic in _ModelBuilder to merge those dependencies with the implicit depencies (i.e., those retrieved from variable intents)

In the same PR or the next one:

allow intent='inout' for foreign variables
update graph validation in _ModelBuilder: ensure that explicit dependencies exist for all ambiguous cases

Then in follow-up PRs:

implement graph reduction/optimization/simplification and expose that through some API (if better graph visualization is the only motivation I would even suggest to expose that only as an option to Model.visualize for now)
revisit how model output snapshots are saved
update the docs: we really need better docs about how process ordering is done in xarray-simlab
better handling of explicit deps propagation: maybe some smart dependencies reconstructions when explicit deps are broken after model.update_processes or model.drop_processes

xarray-contrib / xarray-simlab

cyclic foreign vars #177