mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

Pipeline.add upstreams parameter default should not be a list #2

Closed bjarnagin closed 6 years ago

bjarnagin commented 6 years ago

https://github.com/mara/data-integration/blob/master/data_integration/pipelines.py#L162 def add(self, node: Node, upstreams: [typing.Union[str, Node]] = []) -> 'Pipeline':

This code creates a new list when the code is compiled and sets it as the default for the upstreams parameter. All calls to this function will use that list. Here's a short illusatrative example:

>>> def append(value, lst=[]):
...     lst.append(value)
...     return lst
...
>>> append(1)
[1]
>>> append(2)
[1, 2]

You probably want to set the default parameter to None and then initialize a new list inside the method if upstreams is None, otherwise all pipelines which don't set upstreams when this method is called will share the same upstreams.

martin-loetzsch commented 6 years ago

You are absolutely right, that's not good style. Luckily in this case it was not doing any harm because the upstreams parameter was only used in a loop.