kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Validate hook_impl respect hook_spec #2862

Open noklam opened 11 months ago

noklam commented 11 months ago

Description

Add run-time checks for plugins. Related to #2685 , to lower the learning curve of creating plugin.

While I am developing https://github.com/Galileo-Galilei/kedro-pandera/pull/10 I found that I have a typo, it is hard to debug because pip install -e . would not update the entry_points automtically, I need to re-run pip instal -e .. Initially I thought I was not registering the hook correctly, turns out that it is because I use before_dataset_load instead of before_dataset_loaded, kedro should raise an error.

Context

Develop a new hook / plugin is hard. One of the common trap is that invalid hook fails silently and makes it hard to debug.

This should check multiple things

Possible Implementation

Leverage https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluginManager.check_pending

2863 is a quick PoC to points out how it can be used.

When an invalid Hook is passed.

from kedro.framework.hooks import hook_impl
class NokHook:
    @hook_impl
    def random_hook(self):
        print(123)

Running Kedro with kedro run with result in this

PluginValidationError: unknown hook 'random_hook' in plugin <empty.settings.NokHook object at 0x7fec184b6af0>

Possible Alternatives

noklam commented 1 month ago

https://kedro-org.slack.com/archives/C03RKP2LW64/p1716480352120979 It's also valuable to validate the other way round which is common for user to forget to put @hook_impl