python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.58k stars 143 forks source link

Change Graph.__getitem__ to shorthand for get_cursor #380

Open oatmealdealer opened 4 years ago

oatmealdealer commented 4 years ago

currently Graph.__getitem__() is implemented as:

def __getitem__(self, key):
    return self.nodes[key]

which makes sense... but it only saves about 6 characters of typing graph[a] vs graph.nodes[a]. However... one thing that's kind of clunky is:

graph >> a >> b >> c
graph.get_cursor(a) >> d >> e    # adds nodes that receive data from a

If Graph.__getitem__() was reimplemented as a shorthand for Graph.get_cursor(), the above example could be reduced to:

graph >> a >> b >> c
graph[a] >> d >> e

which I think is much much cleaner and more intuitive. It saves time typing get_cursor() (which users will be doing much more often than retrieving nodes by their numerical index) and is also easier to read. It may even be possible this way to hide the existence of the GraphCursor class from top-level API's entirely. Obviously this would be a breaking change, but it would be a very simple matter to regex find and replace graph\[(.+)\] with graph\.nodes\[$1\].

As a bonus, this could also eliminate the need for the Graph.orphan() method, since one could simply call graph[None] to achieve the same functionality.

hartym commented 4 years ago

Trying to think about implications first but the api looks indeed nicer the way you propose it.

hartym commented 4 years ago

Tests are broken, can you have a look ?

oatmealdealer commented 4 years ago

Looks like tests are passing now, there was one test that was using the __getitem__ behavior in its assertions that needed to be updated.

oatmealdealer commented 4 years ago

Did a couple more fixes and it looks like all tests are passing now. Let me know if I should check on anything else.

hartym commented 4 years ago

I'm starting to test your branch on some transformations I run to try to find BC breaks. Would you be kind enough to add a few documentation patches to explain the [] operator usage ? I think the doc should explain it calls the lower-level API "get_cursor" but encourage usage of [] instead of the method call. Thanks

oatmealdealer commented 4 years ago

I'm starting to test your branch on some transformations I run to try to find BC breaks. Would you be kind enough to add a few documentation patches to explain the [] operator usage ? I think the doc should explain it calls the lower-level API "get_cursor" but encourage usage of [] instead of the method call. Thanks

sure. I'll take a look and see where a good place would be to update it.

MChrys commented 4 years ago

I would add my contribution, because I was actually building my own pipeline library but I have better to contribute to your, as bonobo is more far than I am .

This is how I implemented my graph builder

# parser  & add edges examples
#    T = graph()
#    T >> s1 >>  (s2 >> s5) + s3 
#>>>      s1
#>>>     /  \
#>>>    s2  s3
#>>>   /     |
#>>>  s5     |
#>>>  |      |
#>>>  O      O
#
#    T[[:s5>]] >> s6
#
#>>>      s1
#>>>     /  \
#>>>    s2  s3
#>>>   /     |
#>>>  s5     |
#>>>  |      |
#>>>  s6     |
#>>>  |      |
#>>>  O      O
# 
#OR 
#    T |  s5 >> s6
#
#>>>      s1
#>>>     /  \
#>>>    s2  s3
#>>>   /     |
#>>>  s5     |
#>>>  |      |
#>>>  s6     |
#>>>  |      |
#>>>  O      O
#
#   a = T[:s5].clone()
#   a
#>>>      s1
#>>>     /  
#>>>    s2  
#>>>   /     
#>>>  s5     
#>>>  |      
#>>>  O  
#Try pipe fast in notebook with Input test
#   T[[:s5>]].test(Input)
#
#
# 1 er : T.start >> s1 >>  (s2 >> s5) + s3  | (s2 + s3) << s7 
#
#>>>      s1
#>>>     /  \
#>>>    s2  s3
#>>>   / \ / |
#>>>  s5 s7  |
#>>>  |   |  |
#>>>  O   O  O
#start property clean the graph for rebuild it fast in notebook
# a = T[[:s5],[:s7]]
#>>>a
#>>>      s1
#>>>     /  
#>>>    s2 
#>>>   /  \ 
#>>>  s5  s7  
#>>>  |    |  
#>>>  O    O