Open m-gris opened 1 year ago
Thanks @m-gris for opening this feature request! We'll discuss about it soon.
Thanks @m-gris for opening this feature request! We'll discuss about it soon.
Thanks for considering it ! :)
I have evidence this could be useful too. But it may requires something more than just unpacking
# model_predict/functions.py
def make_prediction(model, a, b, c, d, e):
...
# model_predict_pipeline/nodes.py
def make_prediction_node(model, params: Dict):
make_prediction(
model,
params.get("something", None),
params.get("something", None),
params.get("something", []),
params.get("something", []),
)
Noted that here two things are done:
Or maybe 2. is bad and we shouldn't support it because passing a list into function argument is anti-pattern and None
should be default in make_prediction
already. Is there use case where 2. is also valid?
edit(deepyaman): format code example to make it easier to read
In short, I think it makes sense to pass a parameters dictionary to a node (i.e. the current supported behavior).
While I agree that dict unpacking sounds nice, I wonder how often people would mess up because configuration keys don't match parameter names. Also, how often would the value of supporting this syntax be realized? I feel like it's most beneficial when you have a node (say, def make_prediction(model, a, b, c, d, e): ...
from the example posted by @noklam) that you want to call both by passing a parameter dictionary and ALSO in cases by explicitly passing different parameters to the various arguments. Does this happen often in a codebase? One could argue that Python supports the unpacking syntax, but I don't think it's fair to think Kedro nodes should be near as flexible.
On the last example of assigning a default, I tend to agree that you could just use keyword arguments with default.
I tend to think we shouldn't add arbitrary constrains unless there are good reason. And often users will try to find their way to do it anyway (e.g. Using environment variable for everything).
@Nok I agree about using keyword arguments, and think you could then simplify it to:
some_package/model_predict_pipeline/nodes.py
def make_prediction_node(model, params: Dict): make_prediction(model, **params)
I responded more fully on the linked issue.
I think this will be more common if you need to reuse the function, which is something Kedro values. This will work, but as a package it hides all the function signature and it becomes less clear what parameters are available.
On the note that you mentioned people may mess up. In that case, I expect it is the same when you use a python function that takes keyword arguments but you passed a wrong keyword. You should get an explicit error and I don't see any problem with that.
@m-gris since you created this issue, what's your thoughts on this?
In general I believe we need this so that modular pipeline reuse can be dynamic
@datajoely do you have an example?
See this thread https://github.com/kedro-org/kedro/issues/1228
Adding more evidence that this would be useful - https://linen-slack.kedro.org/t/13902185/if-kedro-supports-unpacking-parameters-dictionary-would-it-c#fe7ba273-2708-4764-8473-920b0636e78f
Markus Sagen and marcccin is in favor of this too
Imho, unpacking at the pipeline compilation level would be also beneficial from the perspective of pipeline validation. If I understand correctly -let's say we have parameters.yml
:
some_params:
a: 1
b: 2
def node(data_input, a, b, c):
# use c
pass
Using the "unpacking" inputs=["data", "**params:some_params"]
will actually validate the function signature vs the params (count and names) - raising immediately that c
is missing,
whereas for node
def node(data_input, params: dict):
# use params["c"]
pass
with inputs=["data", "params:some_params"]
it can result in a runtime error if c
is missing. 🤔
good point!
Hi
Great to see that this is sparking some interest / debate :)
@nok: Aside from what I said in my original message I do not have much to add... But I must say that marrrcin point is indeed a good / important one !
@m-gris Would you be able to open a PR on this? We have a brief discussion today and we think it would be a useful feature to add. It's not our priority immediately, but we are happy to accept PR as this is something useful to the community.
@noklam I wouldn't be able to work on it immediately. But happy to try as soon as my workload will allow. Hopefully Soon :)
@m-gris Amazing! That's totally fine and looking forward to the PR 🚀
Just adding additional info where this could be useful. This isn't on the *kwargs unpacking but the args unpacking:
https://linen-slack.kedro.org/t/9703505/hey-all-simple-question-is-it-possible-to-pass-both-position https://github.com/kedro-org/kedro/issues/2408#issuecomment-1505527948
@inigohidalgo good point, I have raised the same issue in https://github.com/kedro-org/kedro/issues/2924#issuecomment-1688124435
More broadly I think we should look at this as how different a node is versus a Python function. I also not sure how hard is it to implement a non breaking change.
When defining a node, it would be really nice and convenient if the params dictionary could be unpacked with something like:
This would not only make the node definition more concise than
But it would also result in a more dynamic and flexible node definition, "immune" to changes in the function's signature. (in which case all that would require an update would be the params file, as opposed to both the node def. and the params file in the other case)
I hope that this request / suggestion does not go against Kedro philosophy :)
Thx & regards M.