pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
10 stars 1 forks source link

How to access the child node of a macro node in a parent macro node #242

Closed samwaseda closed 1 month ago

samwaseda commented 3 months ago

Imagine the following situation: I have a macro node which does: y1 = (a + b) * c, where I have two functions addition and multiplication. On top of this macro, I would like to make another macro to apply y2 = y1 + a + b, meaning I simply add the result of the first macro and add the result of addition. In code it would basically look like this:

@single_value_node("sum")
def addition(a, b):
    return a + b

@single_value_node("product")
def multiplication(a, b):
    return a * b

@macro_node("result")
def add_and_multiply(macro):
    macro.addition = addition(1, 2)
    macro.multiplication = multiplication(macro.addition, 5)
    return macro.multiplication

@macro_node("result")
def add_and_multiply_and_add(macro):
    macro.add_and_multiply = add_and_multiply()
    macro.addition = addition(macro.add_and_multiply.outputs.result, macro.add_and_multiply.addition.outputs.sum)
    return macro.addition

This code does not work, because macro.add_and_multiply.addition.outputs.sum in add_and_multiply_and_add raises an error.

In practice, this happened when I created a macro node, in which I had a macro node to obtain interstitial sites and a function node to calculate their energies, and I needed both the interstitial positions and energy values in a different macro.

liamhuber commented 3 months ago

This code does not work, because macro.add_and_multiply.addition.outputs.sum in add_and_multiply_and_add raises an error.

As it should, so I'm happy about that. But I just copied and executed your code and I agree that in this case the resulting message is not helpful at all.

The reason this is disallowed is that macros are "walled gardens", which is to say that there are no channel connections (neither signal nor data) from outside the macro into it or vice versa. Of course the user is still free to follow the semantic path and interact with each child node arbitrarily deep, it's just the actual connections that are forced to go through the macro IO interface.

The root of this is that we want to allow cyclic graphs. I can't prove it, but I am extremely confident that it is impossible to accomplish that with data connections alone, you need something equivalent to signal channels to get the job done. Of course signal channels are annoying and for DAGs we can rely soley on the data connections, so we want a way to enclose the cyclic subgraph (where manual signal connections are necessary) to be as small as possible, so it can be embedded in a larger, automated execution DAG. Now here I have less confidence that how I do it is the "only" way, but I spent a long time thinking about it and couldn't find anything better, so I'm not inclined to alter it without significant forethought -- the Macro unit is what acts as a wall for connections that allows us to encapsulate subgraphs and run cyclicity analysis on graph connections. I'm happy to talk about it more, but this is already verbose so I'll leave it there unless you want to dig deeper.

Now for your use case, we still have two practical solutions. Starting with your same base...

from pyiron_workflow import Workflow

@Workflow.wrap_as.single_value_node("sum")
def addition(a, b):
    return a + b

@Workflow.wrap_as.single_value_node("product")
def multiplication(a, b):
    return a * b

1) In a very simple example like you showed here, we can leverage the signature-parsing for a pretty elegant (IMO) solution:

@Workflow.wrap_as.macro_node("result")
def add_and_multiply(macro):
    macro.addition = addition(1, 2)
    macro.multiplication = multiplication(macro.addition, 5)
    return macro.multiplication

@Workflow.wrap_as.macro_node("result")
def add_and_multiply_and_add(macro):
    macro.add_and_multiply = add_and_multiply()
    macro.addition = addition(
        macro.add_and_multiply.outputs.result, 
        macro.add_and_multiply.addition.outputs.sum
    )
    # EDIT: THIS HAS AN ERROR. It doesn't even compile and run
    # This is exactly the issue Sam was running into to start with!
    # One needs to go back and expose addition and multiplication results directly
    # to have something like an `addition_result` output
    # Then invoke `macro.add_and_multiply.outputs.addition_result
    # instead of `macro.add_and_multiply.addition.outputs.sum`
    # Which erroneously tries to make a connection with a child node's channel
    return macro.addition

n = add_and_multiply_and_add()
n()
>>> {'result': 18}

2) In a more realistic case, we can use the output map to modify the macro's IO so it can be used by siblings. This is not the sexiest syntax, I admit, but it shows how power-users retain an awful lot of flexibility in our setup:

@Workflow.wrap_as.macro_node("result")
def add_and_multiply(macro, a=1, b=2, c=5):
    macro.addition = addition(a, b)
    macro.multiplication = multiplication(macro.addition, c)
    return macro.multiplication

@Workflow.wrap_as.macro_node("result")
def add_and_multiply_and_add(macro, a=1, b=2, c=5):
    macro.add_and_multiply = add_and_multiply(
        a=a, 
        b=b,
        outputs_map={"addition__sum": "addition_result"}
    )
    macro.addition = addition(
        macro.add_and_multiply.outputs.result, 
        macro.add_and_multiply.outputs.addition_result,
    )
    return macro.addition

n = add_and_multiply_and_add()
n()
>>> {'result': 18}

(Note that we can do this without using a signature for the macros, I just thought it's nicer with so that we can succinctly modify the input like n(a=2, b=42, c=6). Without it still looks like:

@Workflow.wrap_as.macro_node("result")
def add_and_multiply(macro):
    macro.addition = addition(1, 2)
    macro.multiplication = multiplication(macro.addition, 5)
    return macro.multiplication

@Workflow.wrap_as.macro_node("result")
def add_and_multiply_and_add(macro):
    macro.add_and_multiply = add_and_multiply(
        outputs_map={"addition__sum": "addition_result"}
    )
    macro.addition = addition(
        macro.add_and_multiply.outputs.result, 
        macro.add_and_multiply.outputs.addition_result,
    )
    return macro.addition

n = add_and_multiply_and_add()
n()
>>> {'result': 18}

)

I'm very happy to talk about syntactic improvements for (2), but I'm quite pleased with the power we have, and it would really take some convincing to get me to go back and change the macro IO paradigm being used.