kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 879 forks source link

Improve error messages for modular pipelines #3716

Closed AhdraMeraliQB closed 4 months ago

AhdraMeraliQB commented 4 months ago

Description

Closes #2633

Our ModularPipelineErrors have not been the most informative or clear to understand. This PR updates two of them in an effort to improve clarity.

  1. When creating a pipeline using the pipeline() function, if the provided inputs/outputs/parameters do not match up with those of the nodes provided, the following error message, with relevant suggestions, will now be raised:
ModularPipelineError: Failed to map these inputs onto the nodes provided: model_input_table - did you mean one of these instead: model_input_table_NOT_FOUND
  1. The error message ModularPipelineError: Inputs should be free inputs to the pipeline has been updated to the more clear ModularPipelineError: Inputs must not be outputs from another node in the same pipeline

  2. The error message ModularPipelineError: Outputs can't contain free inputs to the pipeline has been updated to the more clear ModularPipelineError: All outputs must be generated by some node within the pipeline

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

astrojuanlu commented 4 months ago

Tested this locally ⭐ thanks @AhdraMeraliQB and reviewers!