kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Add default for `env` in `KedroContext` #3958

Closed ankatiyar closed 1 week ago

ankatiyar commented 1 week ago

Description

Fix #3954

Development notes

I checked the test with Kedro 0.18.14 and it was still working. The test breaks from 0.19 with this change - https://github.com/kedro-org/kedro/pull/3300

Add a default value None to env.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

yury-fedotov commented 1 week ago

Thanks for looking into this!

ankatiyar commented 1 week ago

@yury-fedotov Thanks for reporting the issue :)

ankatiyar commented 1 week ago

Thanks! I overlooked this. Can we keep the change at the same line? This is changing the order of arguments, is it needed?

@noklam It was erroring out saying arguments with default values cannot be before ones without default values

merelcht commented 1 week ago

Thanks! I overlooked this. Can we keep the change at the same line? This is changing the order of arguments, is it needed?

@noklam It was erroring out saying arguments with default values cannot be before ones without default values

Unfortunately, changing the order of arguments is a breaking change. So in this case it's better to ignore the error about argument order instead.

ankatiyar commented 1 week ago

I've gone back to the original order but added default values to subsequent arguments. Not sure if this has any implications but the tests seem to be passing. The other way to fix the issue linked would be to change the example test in starters. WDYT? 🤔 @noklam @merelcht

noklam commented 1 week ago
    env: str | None = field(init=True, default=None)
    _package_name: str = field(init=True, default="")
    _hook_manager: PluginManager | None = field(init=True, default=None)

Unfortunately I think there is no nice solution here. ^ I don't think there should be a default for hook_manager, does it even work when it is None?

For package_name, I did a quick search and find no reference, I don't really know why it is needed. The problem is we cannot add a default for env without also adding default for all subsequent arguments. If we add default to all of them it's also a breaking change. (adding only to env is OK because I consider that's a bug fix, but it's not possible as mentioned).

My suggestion is not to fix anything in kedro, but remove the test. In fact, I don't think the KedroContext has any use in tests. In a standard kedro run, we should create the session and let the session to create context, runner, config_loader etc. However, in our test template, we created the config loader and runner manually. In addition, the context actually use the real hooks, but the tests that created run with no hook, so they are not consistent at all.

In our own testing docs, KedroContext is not mentioned at all.

ankatiyar commented 1 week ago

@noklam yeah, i'll add your comment to the issue and put that in backlog and close this PR for now!