kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Ensure no nodes can depend on themselves even when transcoding is used #3812

Closed idanov closed 2 months ago

idanov commented 2 months ago

Description

We have figured out in https://github.com/kedro-org/kedro/issues/3799 that Kedro allows self-referencing nodes, as long as they use transcoding in their dataset names as follows:

node(
        func=load_shuttles_to_csv,
        inputs="test_data@excel",
        outputs="test_data@csv",
        name="load_shuttles_to_csv_node",
)

This makes the transcoding feature inconsistent with basic Kedro invariants like "no loops allowed in the graph". This PR adds a change to the node validation code, which raises an error and makes the message mention transcoded dataset names.

It also adds a couple of tests and moves around a bit some of the transcoding helpers, so they can be accessible to all modules in kedro.pipeline.

While working on this, the PR also reveals a bit of weird place of using _strip_transcoding in KedroContext which should probably be addressed when KedroSession is redesigned.

Development notes

See above.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

astrojuanlu commented 2 months ago

This accidentally made a small breakage of backwards incompatibility https://github.com/kedro-org/kedro-viz/issues/1865

ElenaKhaustova commented 2 months ago

This accidentally made a small breakage of backwards incompatibility kedro-org/kedro-viz#1865

Shall we somehow highlight this in the current release 0.19.4?