python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.58k stars 143 forks source link

resolve start and end of named chain are identical as _input and _output #379

Closed xunxky closed 4 years ago

xunxky commented 4 years ago

Not sure if this is a bug or should be addressed with a feature request. But basically it is just not intuitive and adds overhead where no overhead would be necessary!

example

Graph-Example:


   graph.add_chain(
        DATAIN,
        FILTERDATA,
        _name="in"
    )

    graph.add_chain(
        DATAOUT,
        _input=None,
        _name="load"
    )

    graph.add_chain(
        TRANSFORMATION_NEEDING_FILTERED_DATA,
        _input="in",
        _output="load"
        _name="other"
    )

This will try to resolve the input for the "other" chain to be the first node in the "in" chain. which is wrongly "DATAIN", but actually it would be required to have "FILTERDATA" to be the resolved element here.

This issue arises, due to the code in Graph.add_chain only referencing the first element in a chain to the "named" dict.

Ideas

there would be some ways to resolve this:

direction based resolution

the most simple way to resolve this would be to have a second "named" dict ("e.g. "named_end") for the "final" element of the index.

the resolve would then be either a second function or a further param to indicate which dict to use.

workaround and feature request

provide nodes as tuple or dict with a node name and function when adding them ... then one can work around this issue ...

HINT

in contrast to the code self.named[_name] = _last suggesting the _last element is stored here the condition holds only at the first element of a chain. So one can see how that might slip.

Versions

xunxky commented 4 years ago

after looking at your rewrite for version 0.7 this ill behavior is still present in https://github.com/python-bonobo/bonobo/blob/develop/bonobo/structs/graphs.py but it seems much harder to come by then

hartym commented 4 years ago

I think we can consider that named chains will be soon deprecated and not necessary starting with 0.7, not sure we need to put any effort on this side.

Starting from 0.7, you can just use real references to graph points and there is no need anymore for string based aliases.

xunxky commented 4 years ago

@hartym can you elaborate? (maybe it is something i have not explored yet)

if you mean that i "can" create variables containing generators and cut them into the chains then i can tell you this does not agree with my sense of neatness. I only had to do it once now and it was a "mess" already. defining the nodes upfront and using them all over the place will confuse newcomers. take it as feedback, that depending on how well you explain it in the documentation it might raise demands on the learning curve

hartym commented 4 years ago

No, that's not what I meant. I agree that this way can bring a mess, and although in 0.6 it was basically the only way (along with named chains) to do it, I had the same feeling as you.

The 0.7 API introduces chains as first class citizens, that you can use for this purpose.

You should find the informations on http://docs.bonobo-project.org/en/develop/guide/graphs.html