Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
A nice feature of modular pipelines is that if you make a mistake in mapping inputs/outputs to what nodes need, it raises a ModularPipelineError with a helpful message containing a set of catalog items you missed:
Failed to map datasets and/or parameters onto the nodes provided: <what you didn't map>...
Did you mean <those> instead?
However I think this error message can be even more useful if we detail the mismatch more. For example, to distinguish this:
Am I not supplying something that one of the nodes needs to pipeline wrapper?
Or am I supplying something to the pipeline wrapper that none of the nodes need?
As far as I understand, those 2 scenarios would now lead to the same error message. However it seems possible to make this distinction based on inputs that go to _validate_datasets_exist() function that's responsible for raising this exception.
Context
I think it might improve developer experience while building modular pipelines.
Infer it as two separate sets (that sum to non_existent) called something like redundant_inputs and missing_inputs. And configure the error message reflect the difference.
Description
A nice feature of modular pipelines is that if you make a mistake in mapping inputs/outputs to what nodes need, it raises a
ModularPipelineError
with a helpful message containing a set of catalog items you missed:However I think this error message can be even more useful if we detail the mismatch more. For example, to distinguish this:
pipeline
wrapper?pipeline
wrapper that none of the nodes need?As far as I understand, those 2 scenarios would now lead to the same error message. However it seems possible to make this distinction based on inputs that go to
_validate_datasets_exist()
function that's responsible for raising this exception.Context
I think it might improve developer experience while building modular pipelines.
Possible Implementation
Instead of inferring a single list of mismatches:
Infer it as two separate sets (that sum to
non_existent
) called something likeredundant_inputs
andmissing_inputs
. And configure the error message reflect the difference.