kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Minimal `kedro new` project fails starter test at `KedroContext` initialization #3954

Open yury-fedotov opened 2 weeks ago

yury-fedotov commented 2 weeks ago

Description

This:

@pytest.fixture
def project_context(config_loader):
    return KedroContext(
        package_name="my_package_name",
        project_path=Path.cwd(),
        config_loader=config_loader,
        hook_manager=_create_hook_manager(),
    )

Fails the only unit test because a KedroContext requires env argument to be supplied.

Context

I resolved it for my project by specifying env=None there, but it should work out of the box from the starter.

Steps to Reproduce

  1. Initialize kedro new with all defaults except pyspark
  2. pip install -e .
  3. pytest

Your Environment

ankatiyar commented 1 week ago

Thanks @yury-fedotov for reporting this, I'll add it to the backlog since the solution in #3958 is not ideal.

Context

Changes in 0.19 PR #3300 changed the ordering of the arguments to KedroContext and removed the default value for env. The example test in the starters <project_root>/tests/test_run.py is failing as a result.

I tried adding a default value to env which requires reordering the arguments since arguments with default values can't precede arguments without default values in #3958 but this is a breaking change. Adding default values to all the subsequent arguments also doesn't make sense. (@noklam's comment https://github.com/kedro-org/kedro/pull/3958#issuecomment-2182313144)

The fix probably needs to be in the starters for now.

My suggestion is not to fix anything in kedro, but remove the test. In fact, I don't think the KedroContext has any use in tests. In a standard kedro run, we should create the session and let the session to create context, runner, config_loader etc. However, in our test template, we created the config loader and runner manually. In addition, the context actually use the real hooks, but the tests that created run with no hook, so they are not consistent at all.

In our own testing docs, KedroContext is not mentioned at all.

yury-fedotov commented 1 week ago

Thanks @ankatiyar !

I agree that the change should be in the starter, not in the framework.

However I'm curious why not keep the test and just specify env=None (as it wants) while initializing a context within this test?

In other words, change the code of the test fixture rather than the constructor of the context.

That would allow to keep the test and close this specific issue with minimal changes.

noklam commented 6 days ago

@yury-fedotov I think the change should go to starter, and I don't like the current test example. I just found another problem https://github.com/kedro-org/kedro/issues/3966 with the current test.

My point is I prefer fixing the test example altogether rather than just the specific KedroContext one.

@pytest.fixture
def config_loader():
    return OmegaConfigLoader(conf_source=str(Path.cwd()))

@pytest.fixture
def project_context(config_loader):
    return KedroContext(
        package_name="{{ cookiecutter.python_package }}",
        project_path=Path.cwd(),
        config_loader=config_loader,
        hook_manag

I find this weird, because usually I expect two common way to do this:

  1. Start from the highest level KedroSession
with KedroSession.create(...) as session:
   context = session.load_context()
   config_loader = context.config_loader
   ...

This way you have the KedroContext, ConfigLoader created for you, and it would respect to your settings.py in case you customised the project settings. This would also respect hooks (which can be useful, but you may also not want hook depending on the test you are writing, i.e. you don't want to log Mlflow for unit test)

  1. Create invidivual Component directly, i.e.
    @pytest.fixture
    def config_loader():
    return OmegaConfigLoader(conf_source="conf",base_env="base", default_run_env="local")

In this case, settings.py is not respected and this is more lower level. It could be useful for unit testing if you need fine grain control. In any case, I found the KedroContext fixture not useful at all. The current example create the KedroContext but then the ConfigLoader is not created from KedroContext, what's the point of having it in a user facing codebase? Do you have test relying on the KedroContext fixture?