kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

support of on_dataset_load_error hook just like before_dataset_loaded or after_dataset_loaded #2934

Open thedevd opened 11 months ago

thedevd commented 11 months ago

I have Kedro pipeline where sometime I see pipeline fails to load dataset due to some problem and I get error - "kedro.io.core.DatasetError: Failed while loading data". Is there any plan to provide on_dataset_load_error kind a hook just like we have on_node_error or on_pipeline_error so that I can do my desired things after a particular dataset load fails.

noklam commented 11 months ago

@thedevd What are you trying to do here? Would be great if you can give an example why do you need this.

thedevd commented 11 months ago

Actually in my pipeline few nodes are failing before execution while loading some datasets, and I want to handle any DataSetError via hook (Where I want to keep track of which dataset has failed to load and for which node, consider this kind case of making report of dataset failure) however there is no hook such as on_dataset_load_error in Kedro documentation for this scenario.

For example I have this node with two datasets -

node(
            check_pms,
            ['table1_pm', 'table2_pm'],
            None,
            name='check_pms'
        )

So one of the dataset is failing to load and my entire pipeline is failed.