Closed liamhuber closed 10 months ago
So there's a number of things going on here. For me the first and most important is that when you have a macro where two (or more) children are receiving the same input, there is no convenient way to synchronize these; as shown in your top example, a UserInput
node needs to be explicitly added to the macro, which is a pass-through node for the shared input, and allows that input to be branched to an arbitrary number of children with regular connections. I totally agree that it's a pain in the butt to write all this, and I like the idea of using additional args (or kwargs if you want a default value) as a shorthand for this.
How do we implement such a thing? Under the hood, there is actually some formalism for how all this is happening. In this case, the relevant parts are that "connections" are only formed between input and output channel pairs, IO or OI, never II or OO. In addition to this, two channels of the same category (II or OO) can be "linked" so that value updates to one are automatically sent to the other. Currently, channels can have an arbitrary number of connections but send their value updates to at most one other channel. These features allow us to "pull" on a node and automatically construct its upstream dependencies and give us the freedom to choose whether that pulling stays inside the scope of a given macro, or punches through macro walls to get the entire dependency stack. I'm not claiming it's the only set of properties/implementation that can give us these features, just that whatever choices we make need to continue to support that functionality.
So one choice would be to soften the linking restriction so a channel can synchronize its value
with multiple same-type partners. Then when we parse something like def internal_macro(wf: Macro, instance)
, we modify the resulting macro input so it has an extra instance
channel. I really dislike this for two reasons:
wf.generate_structures = generate_structures(instance=instance)
is a "connection" creation, but now we need to actually make a "link" with that insteadAnother choice would be to interpret the macro creating function in two passes: first, scrape off all args/kwargs beyond the first one and use them to create UserInput
nodes with the corresponding name (and type hints/default values, if those are provided), then pass that pre-populated Macro
instance to the provided function with the nodes as arguments and parse as normal. To the user this looks the same as the first attack, but
SingleValueNode
into the original macro-defining function, lines like wf.generate_structures = generate_structures(instance=instance)
work nativelySo I think this can be done and I like it as a short-hand so that users don't need to define their own UserInput
nodes when they want to fork input to multiple children. The "one choice" is really here for the education of any readers that come by, but I'm being so verbose about "Another choice" also so I've got a reminder on how I want to implement it.
Then the second thing is restricting macro output with the return value, i.e. return wf.fit.fit_dict
in your example. TBH, I am really not a fan of this at all. Once #84 is implemented, it doesn't really make things easier because wf.fit.fit_dict
would already be getting exposed on macro.outputs.fit_dict
anyhow (where macro
is just an instance of the macro`), so I guess the only thing this would be doing is implicitly disabling all other output that would otherwise be scraped (i.e. other disconnected outputs of children).
UserInput
nodes, but here I'm not at all convinced it's a worthwhile tradeoff.With that said, while I am happy with the existence of inputs_map
and outputs_map
and pleased with their simplicity and the level of customization they provide, I don't like that they are all string-based. I think this is yet again a separate issue to raise, but it would be nice to leverage tab-completable keys like wf.outputs_map = {wf.fit.fit_dict: "my_custom_label_at_the_macro_level"}
@macro_node(output_labels="fit_dict")
def internal_macro(wf: Macro, instance, calculator) -> None:
wf.generate_structures = generate_structures(instance=instance)
wf.calc_with_calculator = calc_with_calculator(calculator=calculator, task_dict=wf.generate_structures)
wf.fit = analyse_structures(
instance=instance, output_dict=wf.calc_with_calculator
)
return wf.fit.fit_dict
Pros:
Cons:
calculator
is redundant, since it would get exposed nicely after #84 anyhow -- we only need the macro args for IO that needs to be forked@macro_node()
def internal_macro(wf: Macro, instance) -> None:
wf.generate_structures = generate_structures(instance=instance)
wf.calc_with_calculator = calc_with_calculator(task_dict=wf.generate_structures)
wf.fit = analyse_structures(
instance=instance,
output_dict=wf.calc_with_calculator
)
# And supposing there was some other output we wanted to explicitly silence
wf.outputs_map = {"generate_structures__imagined": None}
# Or even cooler, but is a separate issue:
# wf.outputs_map = {wf.generate_structures.outputs.imagined: None}
Pros:
macro.inputs.calculator
and macro.outputs.fit_dict
get cleanly exposed without any additional workCons:
UserInput
) node(s) in the macro (explicit is better than implicit)
@jan-janssen wrote