kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
680 stars 113 forks source link

Circular Dependency Error When Assigning Layers to "Leaf Datasets"(orphaned) in Kedro Viz #2156

Open gitgud5000 opened 3 weeks ago

gitgud5000 commented 3 weeks ago

Description

In Kedro Viz, datasets defined in the catalog that are "leaf datasets"—meaning they aren't used as inputs to any other nodes—are forced to be placed in the last layer when no layer is explicitly assigned to them.

In the attached image, datasets in the 08_model_input layer are positioned correctly because they have explicitly assigned layers. However, datasets in the bottom red square are moved to the last layer (11_calibration) by default since they are datasets without explicitly defined layers.

image When I try to assign a specific layer (say 08_model_input) to these leaf datasets in the catalog, I encounter a circular dependency error. This issue only happens for datasets that are leaf nodes; datasets used as inputs to other nodes behave as expected without triggering this error.

Defining specific layers for these leaf datasets results in a circular dependency error, even though datasets output by the same node (and used in subsequent nodes) do not trigger the same issue.

Context

This issue limits the flexibility to organize datasets logically across layers. Leaf datasets, if not assigned layers, get forced into the last layer by default. When assigning layers to these datasets, a circular dependency error occurs, making proper layer management difficult.

Steps to Reproduce

  1. Define leaf datasets in the catalog (datasets not used as inputs by any other nodes).
  2. Open Kedro Viz and enable "Show Layers".
  3. Observe that the leaf datasets are positioned in the last layer by default.
  4. Assign a layer to these leaf datasets explicitly in the catalog.
  5. Check for circular dependency errors.

Expected Result

Leaf datasets should respect their assigned layers without triggering circular dependency errors. They shouldn’t default to the last layer (unless intended). I think they should remain in the same layer as their generating node.

Actual Result

WARNING  Layers visualisation is disabled as circular dependency detected among layers.                                                                                                       layers.py:120

Environment

Checklist

gitgud5000 commented 3 weeks ago

image

Here, I assigned the 99_orphans layer to one of the previously orphaned datasets.

Interestingly, this did not trigger the circular dependency error (or assigning the 09_model_output layer either, only when assigning the 08_model_input layer, which should be allowed). The image shows how all these leaf datasets are now grouped into the newly defined 99_orphans layer, confirming that these datasets are consistently moved to the bottom-most layer in the stack.

rashidakanchwala commented 3 weeks ago

Thank you @gitgud5000 for raising this issue. We will look into it.

lrcouto commented 3 weeks ago

Hey, @gitgud5000, @rashidakanchwala and I are trying to reproduce the issue that you encountered, but we couldn't trigger this circular dependency error that you found. Would it be possible for you to share some more information about your project setup?

I've tried to reproduce it on the demo project contained on the kedro-viz repo, assigning some of the previously unassigned nodes to both the reporting and tracking layers, and the warning did not appear.

image

gitgud5000 commented 3 weeks ago

I will try to produce an example in a Kedro project and share it with you soon. @lrcouto