Alternative interfaces for macro definition

So there's a number of things going on here. For me the first and most important is that when you have a macro where two (or more) children are receiving the same input, there is no convenient way to synchronize these; as shown in your top example, a UserInput node needs to be explicitly added to the macro, which is a pass-through node for the shared input, and allows that input to be branched to an arbitrary number of children with regular connections. I totally agree that it's a pain in the butt to write all this, and I like the idea of using additional args (or kwargs if you want a default value) as a shorthand for this.

How do we implement such a thing? Under the hood, there is actually some formalism for how all this is happening. In this case, the relevant parts are that "connections" are only formed between input and output channel pairs, IO or OI, never II or OO. In addition to this, two channels of the same category (II or OO) can be "linked" so that value updates to one are automatically sent to the other. Currently, channels can have an arbitrary number of connections but send their value updates to at most one other channel. These features allow us to "pull" on a node and automatically construct its upstream dependencies and give us the freedom to choose whether that pulling stays inside the scope of a given macro, or punches through macro walls to get the entire dependency stack. I'm not claiming it's the only set of properties/implementation that can give us these features, just that whatever choices we make need to continue to support that functionality.

So one choice would be to soften the linking restriction so a channel can synchronize its value with multiple same-type partners. Then when we parse something like def internal_macro(wf: Macro, instance), we modify the resulting macro input so it has an extra instance channel. I really dislike this for two reasons:

It bastardizes macros and function nodes; currently macro input (like workflows) is exclusively constructed as a function of its children. IMO that is a clean and beautiful thing.
It complicates the syntax, since wf.generate_structures = generate_structures(instance=instance) is a "connection" creation, but now we need to actually make a "link" with that instead

Another choice would be to interpret the macro creating function in two passes: first, scrape off all args/kwargs beyond the first one and use them to create UserInput nodes with the corresponding name (and type hints/default values, if those are provided), then pass that pre-populated Macro instance to the provided function with the nodes as arguments and parse as normal. To the user this looks the same as the first attack, but

Macros are still just macros, and their IO is still just build from the IO of their children
Since we are now passing a SingleValueNode into the original macro-defining function, lines like wf.generate_structures = generate_structures(instance=instance) work natively

So I think this can be done and I like it as a short-hand so that users don't need to define their own UserInput nodes when they want to fork input to multiple children. The "one choice" is really here for the education of any readers that come by, but I'm being so verbose about "Another choice" also so I've got a reminder on how I want to implement it.

Then the second thing is restricting macro output with the return value, i.e. return wf.fit.fit_dict in your example. TBH, I am really not a fan of this at all. Once #84 is implemented, it doesn't really make things easier because wf.fit.fit_dict would already be getting exposed on macro.outputs.fit_dict anyhow (where macro is just an instance of the macro`), so I guess the only thing this would be doing is implicitly disabling all other output that would otherwise be scraped (i.e. other disconnected outputs of children).

Implicit is better than explicit. Sometimes implicit is worth it for the convenience, e.g. as above with implicitly creating UserInput nodes, but here I'm not at all convinced it's a worthwhile tradeoff.
Parsing input args is super easy. Parsing returned values is a gigantic pain in the ass. I don't want to try to implement this.

With that said, while I am happy with the existence of inputs_map and outputs_map and pleased with their simplicity and the level of customization they provide, I don't like that they are all string-based. I think this is yet again a separate issue to raise, but it would be nice to leverage tab-completable keys like wf.outputs_map = {wf.fit.fit_dict: "my_custom_label_at_the_macro_level"}

Original wish

@macro_node(output_labels="fit_dict")
def internal_macro(wf: Macro, instance, calculator) -> None:
    wf.generate_structures = generate_structures(instance=instance)
    wf.calc_with_calculator = calc_with_calculator(calculator=calculator, task_dict=wf.generate_structures)
    wf.fit = analyse_structures(
        instance=instance, output_dict=wf.calc_with_calculator
    )
    return wf.fit.fit_dict

Pros:

Automatic forking of input to multiple children
Output filtering is very concise and code-based (rather than string based)

Cons:

Macro nodes have their own IO independent of children, bastardizing the concept of macro and function nodes.
IO attribute setting/kwargs need to be overloaded even more to allow "links" in addition to value updates and connection formation
Return signature parsing is a pain to implement
Including calculator is redundant, since it would get exposed nicely after #84 anyhow -- we only need the macro args for IO that needs to be forked
Output needs labels again (might be able to get away without this if we parse the return values cleverly enough, but as stated I don't enjoy parsing return signatures)

Proposed alternative

@macro_node()
def internal_macro(wf: Macro, instance) -> None:
    wf.generate_structures = generate_structures(instance=instance)
    wf.calc_with_calculator = calc_with_calculator(task_dict=wf.generate_structures)
    wf.fit = analyse_structures(
        instance=instance, 
        output_dict=wf.calc_with_calculator
    )
    # And supposing there was some other output we wanted to explicitly silence
    wf.outputs_map = {"generate_structures__imagined": None}
    # Or even cooler, but is a separate issue:
    # wf.outputs_map = {wf.generate_structures.outputs.imagined: None}

Pros:

Easy to define shared input that gets forked to multiple children, as above (implicit is better than inconvenient)
Macros keep their promises: all IO is just child IO
After #84 macro.inputs.calculator and macro.outputs.fit_dict get cleanly exposed without any additional work

Cons:

Implicitly creating extra (UserInput) node(s) in the macro (explicit is better than implicit)
Explicit string-based mapping is still required to disable undesired return values, which might get verbose

pyiron / pyiron_workflow

Alternative interfaces for macro definition #85

Original wish

Proposed alternative