kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.96k stars 903 forks source link

Testing Hooks in Kedro #4161

Open lordsoffallen opened 1 month ago

lordsoffallen commented 1 month ago

Description

Kedro hooks testing isn't straightforward. I had some issues testing it. Also doing def f(*args, **kwargs) on hooks doesn't work but if we define the variable we want to access it works such as def f(conf_creds)

I think some guidelines on documentation on how a user can test hook function would be useful

Context

conftest.py

@fixture(scope='session')
def config_loader():
    return OmegaConfigLoader(
        conf_source=str(PROJECT_PATH / settings.CONF_SOURCE),
        env="test",
        base_env="base",
        default_run_env="test"
    )

@fixture(scope='session')
def kedro_context(config_loader):
    return KedroContext(
        package_name="projx",
        project_path=PROJECT_PATH,
        config_loader=config_loader,
        hook_manager=_create_hook_manager(),
        env="test"
    )

@fixture()
def kedro_session():
    bootstrap_project(PROJECT_PATH)
    return KedroSession.create(PROJECT_PATH, env="test")

test_run.py

def test_catalog_hook(kedro_context):
    # Invoke catalog loading
    catalog = kedro_context.catalog.list()

    # By this time we should be running the catalog hook
    print("stop")

hooks.py

class EnvHook:
    @hook_impl
    def after_catalog_created(self, *args, **kwargs) -> None:
        print("Hello")

Trying this and debugger doesn't stop as hook code. I don't see print statements when i run as well. This code only works when I run through a session.

Following works:

def test_catalog_hook(kedro_session):
    # Invoke catalog loading
    context = kedro_session.load_context() 

    # By this time we should be running the catalog hook
    print("stop")

Possible Implementation

Possible Alternatives

merelcht commented 1 month ago

Hi @lordsoffallen thanks for raising this issue. Can I ask why you want to test the hook specifically? Usually you wouldn't add tests for third party components, but more on the part of the code using that. In this case I'm also wondering if unit tests are the most suitable or if an e2e setup where you run the pipeline is better.

lordsoffallen commented 1 month ago

So I want test that integration between my hook and kedro works as expected. This removes unexpected issues later.

Context: https://kedro-org.slack.com/archives/C03RKP2LW64/p1726153386288589?thread_ts=1726141471.091099&cid=C03RKP2LW64

I also noticed that (*args, **kwargs) pattern don't work in hooks, that i need to specify the parameter name itself. Basically, I wanna make sure my custom and kedro wiring works correct as expected