kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.03k stars 906 forks source link

dict unpacking with "**params:foo.bar" in `node(inputs=)` #2749

Open m-gris opened 1 year ago

m-gris commented 1 year ago

When defining a node, it would be really nice and convenient if the params dictionary could be unpacked with something like:

node(foo, inputs=['dataset', '**params:bar'] ...

This would not only make the node definition more concise than

inputs=['dataset', "params:bar.a", "params:bar.b", "params:bar.c" ... 

But it would also result in a more dynamic and flexible node definition, "immune" to changes in the function's signature. (in which case all that would require an update would be the params file, as opposed to both the node def. and the params file in the other case)

I hope that this request / suggestion does not go against Kedro philosophy :)

Thx & regards M.

astrojuanlu commented 1 year ago

Thanks @m-gris for opening this feature request! We'll discuss about it soon.

m-gris commented 1 year ago

Thanks @m-gris for opening this feature request! We'll discuss about it soon.

Thanks for considering it ! :)

noklam commented 1 year ago

I have evidence this could be useful too. But it may requires something more than just unpacking

# model_predict/functions.py
def make_prediction(model, a, b, c, d, e):
    ...

# model_predict_pipeline/nodes.py
def make_prediction_node(model, params: Dict):
    make_prediction(
        model,
        params.get("something", None),
        params.get("something", None),
        params.get("something", []),
        params.get("something", []),
    )

Noted that here two things are done:

  1. Dict unpacking
  2. Assign Default

Or maybe 2. is bad and we shouldn't support it because passing a list into function argument is anti-pattern and None should be default in make_prediction already. Is there use case where 2. is also valid?

edit(deepyaman): format code example to make it easier to read

deepyaman commented 1 year ago

In short, I think it makes sense to pass a parameters dictionary to a node (i.e. the current supported behavior).

While I agree that dict unpacking sounds nice, I wonder how often people would mess up because configuration keys don't match parameter names. Also, how often would the value of supporting this syntax be realized? I feel like it's most beneficial when you have a node (say, def make_prediction(model, a, b, c, d, e): ... from the example posted by @noklam) that you want to call both by passing a parameter dictionary and ALSO in cases by explicitly passing different parameters to the various arguments. Does this happen often in a codebase? One could argue that Python supports the unpacking syntax, but I don't think it's fair to think Kedro nodes should be near as flexible.

On the last example of assigning a default, I tend to agree that you could just use keyword arguments with default.

noklam commented 1 year ago

I tend to think we shouldn't add arbitrary constrains unless there are good reason. And often users will try to find their way to do it anyway (e.g. Using environment variable for everything).

@Nok I agree about using keyword arguments, and think you could then simplify it to:

some_package/model_predict_pipeline/nodes.py

def make_prediction_node(model, params: Dict): make_prediction(model, **params)

I responded more fully on the linked issue.

I think this will be more common if you need to reuse the function, which is something Kedro values. This will work, but as a package it hides all the function signature and it becomes less clear what parameters are available.

On the note that you mentioned people may mess up. In that case, I expect it is the same when you use a python function that takes keyword arguments but you passed a wrong keyword. You should get an explicit error and I don't see any problem with that.

@m-gris since you created this issue, what's your thoughts on this?

datajoely commented 1 year ago

In general I believe we need this so that modular pipeline reuse can be dynamic

noklam commented 1 year ago

@datajoely do you have an example?

datajoely commented 1 year ago

See this thread https://github.com/kedro-org/kedro/issues/1228

noklam commented 1 year ago

Adding more evidence that this would be useful - https://linen-slack.kedro.org/t/13902185/if-kedro-supports-unpacking-parameters-dictionary-would-it-c#fe7ba273-2708-4764-8473-920b0636e78f

Markus Sagen and marcccin is in favor of this too

marrrcin commented 1 year ago

Imho, unpacking at the pipeline compilation level would be also beneficial from the perspective of pipeline validation. If I understand correctly -let's say we have parameters.yml:

some_params:
   a: 1
   b: 2
def node(data_input, a, b, c):
    # use c
    pass

Using the "unpacking" inputs=["data", "**params:some_params"] will actually validate the function signature vs the params (count and names) - raising immediately that c is missing,

whereas for node

def node(data_input, params: dict):
    # use params["c"]
    pass

with inputs=["data", "params:some_params"] it can result in a runtime error if c is missing. 🤔

noklam commented 1 year ago

good point!

m-gris commented 1 year ago

Hi
Great to see that this is sparking some interest / debate :)
@nok: Aside from what I said in my original message I do not have much to add... But I must say that marrrcin point is indeed a good / important one !

noklam commented 1 year ago

@m-gris Would you be able to open a PR on this? We have a brief discussion today and we think it would be a useful feature to add. It's not our priority immediately, but we are happy to accept PR as this is something useful to the community.

m-gris commented 1 year ago

@noklam I wouldn't be able to work on it immediately. But happy to try as soon as my workload will allow. Hopefully Soon :)

noklam commented 1 year ago

@m-gris Amazing! That's totally fine and looking forward to the PR 🚀

inigohidalgo commented 1 year ago

Just adding additional info where this could be useful. This isn't on the *kwargs unpacking but the args unpacking:

https://linen-slack.kedro.org/t/9703505/hey-all-simple-question-is-it-possible-to-pass-both-position https://github.com/kedro-org/kedro/issues/2408#issuecomment-1505527948

noklam commented 1 year ago

@inigohidalgo good point, I have raised the same issue in https://github.com/kedro-org/kedro/issues/2924#issuecomment-1688124435

More broadly I think we should look at this as how different a node is versus a Python function. I also not sure how hard is it to implement a non breaking change.