Explicit cyclic foreign vars

xarray-contrib / xarray-simlab

Xarray extension and framework for computer model simulations

http://xarray-simlab.readthedocs.io

BSD 3-Clause "New" or "Revised" License

73 stars 9 forks source link

Explicit cyclic foreign vars #164

Open jvail opened 3 years ago

jvail commented 3 years ago

Sorry to open up a new 'battleground' - the last thing that haunts me for now:

What do you, @benbovy, think about the following idea, use case:

Let's say we have two processes A and B. A has a var x that is a foreign variable in B and B has a variable y that is a foreign variable in A. Wont work because you can not derive the DAG in that case.

What if I annotate the foreign variable B.y in A with something like is_cyclic=True. That would mean:

I know this introduces a cycle but I don't care
Just ignore me, A.y, when building the process graph, the ordering
But please still draw the B -> y -> A relation in the visualization (maybe with a step - 1 label, dotted line...)
And when executing A just pass in for y whatever is there (would be a NaN in step = 0 if y is not inout in B or the value set in B.initialize

It just happens that we have a lot of these "cycles" but typically it is just the value from step time - 1. So only quasi-cyclic.

benbovy commented 3 years ago

Could you provide a concrete example where such quasi-cyclic situation happens?

I wonder if you could refactor your problem into 3 processes A, B, C, or if user-defined process ordering in a Model (thus relaxing DAG constraints) would help here.

I think that foreign vars with something like is_cyclic=True would open a big can of worms. It is quite confusing and would complicate further xarray-simlab's internal logic for process sorting (it is already complex).

jvail commented 3 years ago

I think that foreign vars with something like is_cyclic=True would open a big can of worms. It is quite confusing and would complicate further xarray-simlab's internal logic for process sorting (it is already complex).

I don't know the internals but I thought - maybe naively - you could probably just ignore the variables with is_cyclic=True in the process ordering.

Could you provide a concrete example where such quasi-cyclic situation happens?

Fair enough

A - A process for a tree (here leaf) architecture - in practice all sorts of properties B - Photosynthesis C - Carbon allocation

A.leaf_area (out) --> B B.assimilates (out) --> C C.leaf_dry_matter (out) - - > A

In order to compute A.leaf_area A needs C.leaf_dry_matter from t-1 (or the initial value at t = 0). If there is a way to model that in xarray-simlab without cycle then I apologize for not having read the documentation more carefully or missing the obvious.

Of course we could just merge it all together in one process but that is where we come from and we would like to split it up in modules/processes.

benbovy commented 3 years ago

Would it be an option to declare C.leaf_dry_matter with intent='inout' and with some default value (i.e., some placeholder for the value at t = -1)? This way you could avoid the C -> A dependency thus remove cyclic dependencies.

jvail commented 3 years ago

Ok, a fat apology

I forgot to mention that circular imports where in my way when I first tried to get it working ;) This time I tried a combination of xs.group in A (avoid explicit linking and circular import of C) and an 'inout' in C with the same group.

So, moving everything into groups plus 'inout' is a solution but I loose a bit on the side of your nice model visualization/inspection features since I obscure where the variable comes from. But that's worth it!

Thank you for your outstanding support!

benbovy commented 3 years ago

Ok, a fat apology

No worries at all!

Thank you for your outstanding support!

You're welcome!

jvail commented 3 years ago

Sorry, need to bring this up again. I am still thinking about another way to get this "cyclic" dependencies working (this is indeed one of our three pain points apart from sparse and growing index variables):

I sketched a little prove of concept. Here is a notebook: https://github.com/jvail/xarray-simlab/blob/seemingly-cyclic/notebooks/seemingly_cyclic.ipynb

I am not entirely happy with the current work-around:

Declare an 'inout' rather than 'out' variable if you want to let previously run processes to use it) because it requires to deliver an input for each of them although initializing them in the initialize func would be sufficient; i.e. I can not just use intent 'out'.

Also, as far as I can see from the code where you compute the dependencies that you are filtering for 'out' variables to determine process deps and sorting. So simply allowing a "pseudo cyclic" variable with intent 'out' wont work because process ordering will be ambiguous. Therefore, I think, another property is required (for the sake of clarity I called it "seemingly_cyclic") to explicitly exclude this dependency from being evaluated.

It seems clear from my tests that if there is such a dependency the producer must declare a global name (or group - that's a bit confusing I have to say - a group is always global I guess) and the consumer use a global ref (group) in any case to avoid cycles at module import level. Therefore I added the property "seemingly_cyclic" only to "global_ref".

I also added a dashed arrow to the dot graph to visualize this dependency. I can not see why this might open a "pandora's box" and introduce chaos - but I may - most certainly - lack knowledge about the mechanics of your lib.

P.S.: I will try to make a suggestion for #163 as well but that one is really tricky - bear with me.

jvail commented 3 years ago

I kept playing a bit with this idea and tested if we could use it with xs.foreign as well. If we split it into base classes that implement the different aspects of a process I can get around the cyclic import issue and we could arrive at a pretty "plugable", modular model design (in our use case): i.e. just replace a process with some different math as long as it provides the same output.

https://github.com/jvail/xarray-simlab/blob/seemingly-cyclic/notebooks/seemingly_cyclic_foreign.ipynb

The advantage of xs.foreign is that the dependency is explicit - "global_ref" hides it. Makes the model more readable, I think.

benbovy commented 3 years ago

Thanks for sharing your notebooks, it gives a clear overview of the problem.

I admit that the 'inout' variable with a default value workaround is not ideal in the case where you just would use intialize. That said, conceptually it makes some sense IMO: in your example, I see b_v as a variable that pre-exists with some state before the simulation. If you need to set more advanced values as default, you could use a factory (although I'm not sure that the takes_self=True option would work out of the box in xarray-simlab).

TBH, I'm reluctant to implement "pseudo cyclic" variables in xarray-simlab for the main reason that it breaks a fundamental concept of this framework, i.e., that the process workflow may be represented as DAG. Allowing one exception like this is opening the door to more exceptions, which may end up with something conceptually very hard to understand. It also adds a parameter to xs.variable, xs.global_ref, etc., which have already (too) many options. In your notebooks, you need to rely on either global_ref or foreign with some base process classes, which both IMO are not simpler workarounds than 'inout' variables with a default value.

jvail commented 3 years ago

TBH, I'm reluctant to implement "pseudo cyclic" variables in xarray-simlab

Sure, understood and accepted - wont bring it up again - however :) , out of curiosity...

for the main reason that it breaks a fundamental concept of this framework

I don't understand why it would break it because you have it already (global/group + 'inout'). I think such a change would just bring this option to the surface and improve it because an input is no longer required. Is there a fundamental difference between input_vars values and values set in initialize? Isn't it both just the state at step -1?

It also adds a parameter to xs.variable, xs.global_ref, etc., which have already (too) many options

No, only to xs.global_refand xs.foreign. Well, I could imagine to have an xs.cyclic(intent='in') instead ... But I admit, there are already quite a lot of different variables and options which could become even more confusing.

benbovy commented 3 years ago

I don't understand why it would break it because you have it already (global/group + 'inout'). I think such a change would just bring this option to the surface and improve it because an input is no longer required. Is there a fundamental difference between input_vars values and values set in initialize? Isn't it both just the state at step -1?

Ah I see, you're right. Probably it's not clear enough that the value of a 'inout' variable declared in a process should be updated in a different simulation stage in that case (that was clear only in my mind so far :-) ).

An 'inout' variable declared in a process A and that is accessed by a 'in' group or global_ref declared in process B would not be affected by process ordering (i.e., either A -> B or B -> A is correct) as long as the value of this variable is not updated in A and read in B during the same simulation stage.

This implicit condition is a bit unfortunate, actually, and I don't see how we could properly inspect the process classes to check this condition when building a new model :-/.

Your suggestion of a "seemingly_cyclic" reference at least makes the intention a bit clearer, but it doesn't solve this implicit condition which still applies here. I'm not very comfortable with adding another way to allow this.

jvail commented 3 years ago

I'm not very comfortable with adding another way to allow this.

D'accord. Fair enough.

Ah I see, you're right. Probably it's not clear enough that the value of a 'inout' variable declared in a process should be updated in a different simulation stage in that case (that was clear only in my mind so far :-) ).

An 'inout' variable declared in a process A and that is accessed by a 'in' group or global_ref declared in process B would not be affected by process ordering (i.e., either A -> B or B -> A is correct) as long as the value of this variable is not updated in A and read in B during the same simulation stage.

Wow - that's subtle. A bit hard to follow, I find. I think you could put a bit more faith in the modeler who is supposed to know what he is doing. Nevertheless - I regard this as settled and remain quite now and hope you keep up your excellent work! :)

benbovy commented 3 years ago

Nevertheless - I regard this as settled and remain quite now and hope you keep up your excellent work! :)

Thanks! Please continue to give helpful feedback, it's very much appreciated! In this case it made me realize that we can actually break the DAG in the current version!

jvail commented 3 years ago

Please continue to give helpful feedback, it's very much appreciated!

:) All right. Though no guarantees on the "helpfulness". Why not think about the DAG this way:

t1 A->B->C
     ^
       \
         \ 
t0 A->B->C

Taking the time dimension into consideration. If you then go from C to A the cycle disappears. It is just offset in time. The interpretation of the DAG might be too "flat", static. This is what actually happens with 'inout' - it is just not transparent in the visualization.

feefladder commented 3 years ago

I remain a bit confused by this, I think there should be some clear documentation as to what is the best way to have this quasi-cyclic behaviour.

Like: what is the correct way to have a variable that is updated on every timestep? In fastscape, this is done by refactoring : Surf2erode is the surface that all processes get their data from. However, with multiple different processes that need to be updated (e.g. have this cyclic behaviour), this greatly increases the number of processes, just to implement updating a variable. but the most simple way to work around this is basically the hack of changing some variable in the cycle from 'out' to 'inout'? and then make sure that that process:

is the only process that uses finalize_step for these variables
only assigns values to these variables in that step.

However, AFAIK this is not checked (and indeed I could assign values in the run_step in a cyclic case). One way would be to base the DAG on variables in the run_step, but this becomes really hairy (how do we know which variables are called in that step?)

Maybe this should be for the model designer, and proper documentation is just what is needed?

benbovy commented 3 years ago

However, with multiple different processes that need to be updated (e.g. have this cyclic behaviour), this greatly increases the number of processes, just to implement updating a variable.

I agree this is a major limitation in xarray-simlab. I currently use other kinds of hacks in fastscape to circumvent this because computing all uplift and/or erosion processes from the topography at time t may not be the best / most stable solution compared to the chained application of those processes, one after another, each updating the topography. Also, I've never been 100% happy with how output snapshots are saved between run_step and finalize_step. It might be simpler to save them just after run_step is executed for their corresponding process and maybe get rid of finalize_step.

I think that the cleanest way to overcome this limitation would be to allow user-defined process ordering and relax some constraints on variable intent, e.g., allow 'inout' in multiple processes for the same variable.

This would be highly welcome, but this would require a big refactoring and design effort.

An open question is also how can we support both automated (any DAG) and user-defined (single-chain DAG) process ordering? Combining both approaches in the same model would be challenging, but I still like the possibility to just add a process to a model and let xarray-simlab figure out where it could be inserted in the DAG. That's very convenient when we want to extend a model "at the edges" (e.g., for fastscape adding processes that simulate limit conditions such as climate or tectonics).

feefladder commented 3 years ago

I agree this is a major limitation in xarray-simlab. I currently use other kinds of hacks in fastscape to circumvent this because computing all uplift and/or erosion processes from the topography at time t may not be the best / most stable solution compared to the chained application of those processes, one after another, each updating the topography.

Aah, now that explains all the on_demand variables in fastscape?! :)

Also, I've never been 100% happy with how output snapshots are saved between run_step and finalize_step. It might be simpler to save them just after run_step is executed for their corresponding process and maybe get rid of finalize_step.

So, the conclusion here is that the finalize_step is actually not really necessary, since the order of calculation is determined from the DAG anyways? Or it is necessary because of parallel execution? and therefore, the inout variable will be calculated last, and it's value, when used will be from the beginning of the timestep.

I think that the cleanest way to overcome this limitation would be to allow user-defined process ordering and relax some constraints on variable intent, e.g., allow 'inout' in multiple processes for the same variable.

This would be highly welcome, but this would require a big refactoring and design effort.

Oof, that indeed sounds like a lot of work... Honestly, I think this limitation is already overcome with proper use of the inout intent right? Since that is currently the only place where the cycle is 'broken'. Having multiple 'inout' variables would also make the model very complex for the model maintainer I presume? Maybe, just allowing the intent of variables to be changed after process declaration can accomplish this? But then again, this requires proper documentation, because it is far from trivial from a user perspective.

An open question is also how can we support both automated (any DAG) and user-defined (single-chain DAG) process ordering? Combining both approaches in the same model would be challenging, but I still like the possibility to just add a process to a model and let xarray-simlab figure out where it could be inserted in the DAG. That's very convenient when we want to extend a model "at the edges" (e.g., for fastscape adding processes that simulate limit conditions such as climate or tectonics).

So this could also be accomplished by allowing changing intent of xs.variables, or allowing changing the location of the 'inout' intent? something like:

xs.change_inout_to_process(before_process, after_process)

That then puts the after_process last in the calculation chain.

hmm... on second thought, this may become more complicated. Also, I'm not sure if I completely understand what you want to accopmlish.. from looking at fastscape, it seems that you want to be able to change whether tectonics and erosion are done concurrently or simultaneously, and for that change the ordering in the graphs. This should be a (good) other issue I think?

feefladder commented 3 years ago

On another note, I do really like what @jvail did with the dotted arrows. It may be nice to have references to an inout variable as input to another process indicated with a dotted arrow? possibly with an option for that, or as default when the show_only_variable is active?

benbovy commented 3 years ago

I'm not sure if I completely understand what you want to accomplish

Current problem

Let me show a simple example:

@xs.process
class ProcessA:
    foo = xs.variable(intent='inout')

@xs.process
class ProcessB:
    foo = xs.foreign(ProcessA, 'foo', intent='in')

Currently it is possible to do:

model = xs.Model({'a': ProcessA, 'b': ProcessB})

Let's say that both a and b implement run_step and that we want to make sure that b.foo returns the value before it is updated in a. Unfortunately, we can't (*). A current workaround is to update the foo variable in a.finalize_step() that is run after b.run_step().

What we may also want to do:

@xs.process
class ProcessC:
    foo = xs.foreign(ProcessA, 'foo', intent='inout')

But this is not possible either: xs.foreign doesn't support intent='inout' (for the obvious reason that no process order can be determined from the variable intents unless it this order is explicitly defined by the user).

Proposal: user-defined dependencies

This could look like:

model = xs.Model(
    {'a': ProcessA, 'b': ProcessB, 'c': ProcessC},
    dependencies=[('a', 'b'), ('c', 'a')]
)

Where the dependencies argument here means that we explicitly want a be executed after b and c be executed after a (i.e., for each tuple: 1st item is the name of a process that depends on the process named by the 2nd item)

When we know the order of all processes in the model (here b -> a -> c), instead of specifying the whole chain of process dependencies as a list of 2-items tuples we could have the following options for more convenience (all these options have the same semantics than the code snippet here above):

a different syntax for the processes argument:

model = xs.Model([('b', ProcessB), ('a', ProcessA), ('c', ProcessC)])

a class method:

model = xs.Model.from_ordered_collection({'b': ProcessB, 'a': ProcessA, 'c': ProcessC})

a subclass of xs.Model (if we want to always have user-defined process order):

model = xs.SequentialModel({'b': ProcessB, 'a': ProcessA, 'c': ProcessC})

(*) actually, the example used here would just work with a dictionary for processes, but it's more an implementation detail so I'm not sure it's a good idea to rely on it. I'd prefer a more explicit API like:

model = xs.Model({'b': ProcessB, 'a': ProcessA, 'c': ProcessC}, dependencies='strict')

With user-defined dependencies, we could allow intent='inout' for foreign variables, and therefore update a variable successively in multiple processes during the same simulation stage. This would be a major improvement!

Unless one needs to clear some resources a each time step, we could get rid of finalize_step. Time-dependent output snapshots of a variable would then be saved just after the execution of run_step of its corresponding process. If we want to save the values of the same variable that is updated successively in different processes, we could just do:

xs.create_setup(
    model=model,
    clocks={'clock': ...},
    input_vars={'a__foo': ...},
    output_vars={'b__foo': 'clock', 'a__foo': 'clock', 'c__foo': 'clock'}
)

There's some potential challenges, though:

we need to return meaningful error messages when a variable with intent='inout' is used in multiple processes (with any intent) and when no explicit dependencies is given for such processes
we need to check that user-defined dependencies are consistent with the DAG defined from the intent of the variables defined in the model
it might not be straightforward to update the explicit dependencies when creating new models from existing ones using model.update_processes() and model.drop_processes()

benbovy commented 3 years ago

Aah, now that explains all the on_demand variables in fastscape?! :)

Not exactly, on_demand variables are more for some model diagnostics (variables that are not useful in other processes but useful for post-processing, visualization, etc).

A better illustration is the SurfaceToErode process and the SurfaceAfterTectonics process that inherits form it. This creates additional arrays on the grid that could be avoided with user-defined process dependencies.

jvail commented 3 years ago

Great to see the discussion ongoing here. Just some observations while we are porting our model to simlab:

it would indeed be helpful if we could add e.g. a dotted line in case of an all-inout-var-process. Otherwise it sort of sits there like a dead-end with no successor (but it has one - in the next timestep ... :| )
I think adding a sort of 'ordering-hint' (as suggested) would be nice. I introduced anintent='out' variable to the all-inout-var-process in order to force ordering (I needed it to run before another process that was consuming one of the 'inout' vars).

With user-defined dependencies, we could allow intent='inout' for foreign variables, and therefore update a variable successively in multiple processes during the same simulation stage. This would be a major improvement!

Not so sure about this. You could maybe create a new xs.shared variable. It seems very confusing that a xs.foreign can be changed in a consuming process. It might also be difficult to debug since it it can not be deduced anymore how the variable changed in each step because you might have several sub-steps now in a single model step. I think it is a good feature that xs.foreign are read-only outside the process.

feefladder commented 3 years ago

dotted arrows

So I added an option show_inout_arrows in dot.py, that adds dotted arrows as in this example notebook (sorry it is not as concise). Now it shows arrows by default, only in the most basic view. I also enhanced the inout look a bit for input variables.

the other discussion (which is not really the thread title)

this becaome a bit too long....

So I can for now see two examples, that would be solved by allowing the user to order processes: #### having a chain of processes that are executed sequentially, in a user-defined order. -> `xs.foreign(intent='inout')` As in Surf2Erode - SurfaceAfterTectonics in Fastscape. - solved now by subclassing. This also allows for completely different calculation Basically the layout we have (arrows indicate calculation order, so as in the DAG): Normal execution (tectonics (T) and erosion (E) calculated independently and added to topology) ``` T->C ^ S2E ->E/ ``` By subclassing, SurfaceToErode (S2E) no also determines tectonics before. enforced calculation order Erosion calculated after Tectonics. ``` T->S2E->E->C ``` this will also be solved in the following case, since `elevation` can be an `inout` variable, with the `option_to_add_this_as_a_dependency_to_the_DAG` set for erosion processes? However, this only solves for _one_ process that can be put somewhere in a chain, whereas the suggestion by @benbovy of foreign vars with `inout` intent allows for any chain length Maybe, this is better served by another variable class, where the explicit intention is that multiple processes can update it at . #### using an `inout` variable as input to another process, but enforcing it to be calculated first. As suggested by @jvail solved now by transferring that to an `out` variable. (in my notebook as otherclass) This has a drawback that it will introduce a lot of variables pretty quickly. #### possible solution ``` @xs.process class ProcessA: a_v = xs.variable(intent='inout') @xs.process class ProcessB a_v = xs.foreign(ProcessA, intent='in', option_to_add_this_dependency_to_the_DAG=True) ``` I think the benefit of this would be, that only the [`_get_process_dependecies`](https://github.com/Joeperdefloep/xarray-simlab/blob/7cbf487172ee703a641e6bc40ed5bcfb6ceb7908/xsimlab/model.py#L425) has to be updated with a more enhanced algorithm that also checks for this option.

TL;DR I made a proof-of-concept PR, that is exhibited in the example notebook. Actually, both cases are basically the same, if an inout variable is declared, it does not have automatic dependencies. Therefore, the user can add them. But to still allow for automatic dependency sorting, we use a dictionary such as: {'process_name__var_name':'dependent_process_name'} for determining dependencies. Then, only the dependency for that variable is removed (so other processes can be dependencies as well). Strictly speaking, this is not really necessary, and this removing can actually made redundant when graph reduction is implemented #120

What now still can be implemented is the linear model, or a quicker syntax as @benbovy mentioned, that is then converted to the custom_dependencies dict format.

feefladder commented 3 years ago

What we may also want to do:
@xs.process
class ProcessC:
    foo = xs.foreign(ProcessA, 'foo', intent='inout')
But this is not possible either: xs.foreign doesn't support intent='inout' (for the obvious reason that no process order can be determined from the variable intents unless it this order is explicitly defined by the user).

This is implemented in #177, but as of now, it does not check if xs.foreign(intent='inout') variables have a dependency. This may need some more work.

* we need to return meaningful error messages when a variable with `intent='inout'` is used in multiple processes (with any intent) and when no explicit dependencies is given for such processes

Actually, this is even more subtle: a normal variable with intent='inout' does not cause significant problems (assuming that it's calculation is last, except for explicit dependent variables). However, for xs.foreign(intent='inout') variables, it all becomes quite complex. For example in the following configuration:

bad:
  in
  /
out->inout1->in
  \
  inout2->in

both inout1 and inout2 mutate the same variable, with unknown order of calculation. So we would have to ensure that all inout variables are chained linearly Also, AFAIK, the order of calculation in these cases is not known:

good:
out->inout1->inout2
 \     \       \
 in1   in2     in3

apart from implementation, how do we ensure even in the good case that in1 is used before inout1 (or even inout2) does its calculation and then mutates the variable? Because the DAG only ensures that processes are calculated after proper execution of the variables. This would need some kind of lock in execution of xs.foreign(intent='inout') variables, or a DAG that becomes as complex as:

better:
out->inout1->inout2
 \   ^    \   ^ \
  \ /      \ /   \
  in1      in2   in3

where an inout variable depends on all input variables before that. I think this really complicates things, that should be solved in another PR possibly?

* we need to check that user-defined dependencies are consistent with the DAG defined from the intent of the variables defined in the model

Yes, in the current implementation, where the user specifies dependencies as a {'p_name__var_name':'dep_p_name'} dictionary. Now I'm stuck at getting the variable of dep_p from the key, or p_name__var_name

* it might not be straightforward to update the explicit dependencies when creating new models from existing ones using `model.update_processes()` and `model.drop_processes()`

I let the custom_dependencies be copied over (that should work). However, maybe write a function that filters for processes that are not in the model anymore...

benbovy commented 3 years ago

if an inout variable is declared, it does not have automatic dependencies. Therefore, the user can add them.

Yes, exactly! The user must add them, except in the two following cases where this isn't required:

the inout variable has no other reference in the model
the inout variable (resp. foreign) declared in process A has one reference (resp. is the only inout reference to an) out variable declared in process B. In this case I think it's ok to consider that out takes precedence over inout, i.e., B must be executed before A.

But to still allow for automatic dependency sorting, we use a dictionary such as: {'process_name__var_name':'dependent_process_name'} for determining dependencies.

I like the idea of using a dictionary for user-defined dependencies. We could even allow lists of process names as dict values.

However I don't understand why we need the variable name in dict keys. Could you expand on why you skip those variables when retrieving the process dependencies in #177 please?

I let the custom_dependencies be copied over (that should work). However, maybe write a function that filters for processes that are not in the model anymore...

I'm afraid this won't be enough for all cases. Suppose we have a model with a -> b -> c custom order and we want to create new_model = model.drop_processes('b'), then we need some logic so that new_model is built with the custom dependency ('c', 'a') that may not be explicitly stated in model.

You could maybe create a new xs.shared variable. It seems very confusing that a xs.foreign can be changed in a consuming process.

Hmm how xs.shared would be different than xs.foreign? It seems to me that there wouldn't be much difference to justify having two kinds of variables. xs.shared looks a bit confusing to me while xs.foreign makes it clear that the variable is declared in another process (declaration != value).

It might also be difficult to debug since it it can not be deduced anymore how the variable changed in each step because you might have several sub-steps now in a single model step.

This should not be harder to debug if we change how output variable snapshots are saved, like I suggest just before and/or after the execution of every process during the run_step stage and not anymore once between the run_step and finalize_step stages, so that we can save the values for each "sub-step" (i.e. process).

feefladder commented 3 years ago

However I don't understand why we need the variable name in dict keys. Could you expand on why you skip those variables when retrieving the process dependencies in #177 please?

We need the variable name, since a process can depend on multiple other processes (with different variables). The variable name is used to create a dictionary of skip_deps = {'p_name',key} that is then checked (and skipped if present) during normal dependency building.

e.g.:

xs.model({'A':A,'B':B,'C':C,'D',D},
    custom_dependencies={'C__b_var':'B'})

A->B->C
     /
    D

where we find that b_var is an inout var in B, so we have to add it manually. However, normally during normal dependency building, A->C would be there, so we have to skip that dependency using skip_deps. Now, we still use normal dependency building, so C is still added automatically.

I preferred this method over a {p_name:dep_p_name} dictionary, because it also allows the dependency building that is already there.

benbovy commented 3 years ago

However, normally during normal dependency building, A->C would be there, so we have to skip that dependency using skip_deps.

I don't think it's necessary to skip it. The topological sorting algorithm will yield consistent process ordering with or without A->C. Also, I think it's better to keep both user-defined and automatically inferred dependencies instead of doing any graph "optimization" here, even though it results in more edges (dependencies).

feefladder commented 3 years ago

The checking algortihm works now! (I added a transitive reduce, since it was easy) Both require a descendants/deps dict to work. That would also be very useful in drawing the dotted arrows. @benbovy what are your thoughts on adding that to the model class? EDIT: I have kept it in the _ModelBuilder for now, since that seems to be the design pattern.

* the `inout` variable has no other reference in the model

* the `inout` variable (resp. foreign) declared in process `A` has one reference (resp. is the only `inout` reference to an) `out` variable declared in process `B`. In this case I think it's ok to consider that `out` takes precedence over `inout`, i.e., `B` must be executed before `A`.

in fact, we never have to check for out variables, since they always come first. So we have two cases: inout variables with or without corresponding in variables.

I like the idea of using a dictionary for user-defined dependencies. We could even allow lists of process names as dict values.

done that, maybe a set is better? it needs one conversion less here, but maybe users are not as familiar with sets?

I'm afraid this won't be enough for all cases. Suppose we have a model with a -> b -> c custom order and we want to create new_model = model.drop_processes('b'), then we need some logic so that new_model is built with the custom dependency ('c', 'a') that may not be explicitly stated in model.

added that

benbovy commented 3 years ago

Great! I commented on #177.

done that, maybe a set is better? it needs one conversion less here, but maybe users are not as familiar with sets?

Yep internally coercing into a set a sequence (tuple, list, etc) that passed through the API seems good to me.