kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
677 stars 112 forks source link

[KED-2477] Simplify debugging of circular dependencies #383

Closed clstaudt closed 2 years ago

clstaudt commented 3 years ago

Description

kedro viz fails if circular dependencies between layers exist.

Example:

toposort.CircularDependencyError: Circular dependencies exist among these items: {'feature':{'intermediate', 'primary'}, 'intermediate':{'primary'}, 'model':{'intermediate', 'primary', 'feature', 'model_input'}, 'model_input':{'intermediate', 'primary', 'feature'}, 'model_output':{'intermediate', 'primary', 'feature', 'model_input', 'model'}, 'primary':{'intermediate'}, 'reporting':{'intermediate', 'primary', 'feature', 'model_input', 'model_output', 'model'}}
Error: Circular dependencies exist among these items: {'feature':{'intermediate', 'primary'}, 'intermediate':{'primary'}, 'model':{'intermediate', 'primary', 'feature', 'model_input'}, 'model_input':{'intermediate', 'primary', 'feature'}, 'model_output':{'intermediate', 'primary', 'feature', 'model_input', 'model'}, 'primary':{'intermediate'}, 'reporting':{'intermediate', 'primary', 'feature', 'model_input', 'model_output', 'model'}}

At this point a graph visualization of the pipeline would really help to spot and remove the cycle, but... you see the problem, there is a circular dependency here too.

Context

Circular dependencies are easy to introduce by accident in complex pipelines and difficult to find. They do not necessarily lead to pipeline failure with kedro run, so they turn up much later when trying to run kedro viz again.

Possible Implementation

Possible Alternatives

This partially helped me find a circular dependency, but still required additional knowledge to fix it:

dependencies = {'feature':{'intermediate', 'primary'}, 'intermediate':{'primary'}, 'model':{'intermediate', 'primary', 'feature', 'model_input'}, 'model_input':{'intermediate', 'primary', 'feature'}, 'model_output':{'intermediate', 'primary', 'feature', 'model_input', 'model'}, 'primary':{'intermediate'}, 'reporting':{'intermediate', 'primary', 'feature', 'model_input', 'model_output', 'model'}}

import networkx
dependency_graph = networkx.DiGraph(dependencies, )
networkx.algorithms.cycles.find_cycle(dependency_graph)
richardwestenra commented 3 years ago

Thanks @clstaudt! This is a really good point you've raised. Interesting.

  1. As you mentioned, it looks like the best solution would be improved error handling, so that the error message identifies the specific nodes that cause the problem.
  2. I wonder whether, in these cases, we could convert the layers to tags (e.g. tag name = Layer:Layer_Name) so that the user could hover over the different tags to help spot the offending nodes.

The error message solution is probably the best. I've assigned this issue a JIRA ticket so that it can be escalated within the team, and we'll discuss it at the next backlog grooming/technical design session.

idanov commented 3 years ago

@clstaudt Thanks for flagging this. The layering system is meant to be unidirectional and not have circular dependencies in your graph. The layers don't have any meaning in Kedro's code and are only used by Viz, that's why it manifested itself in Viz. One easy way to avoid having circular dependencies is to never have a node connecting a higher layer into a lower layer. The error message is not very friendly and will be fixed with the next release, thanks for spotting that!

datajoely commented 3 years ago

Hi @clstaudt - it will be while before this gets released, but we've recently made a lot more progress in this regard and will be in a good place to surface useful error messages on CLI side and eventually the front end.

tynandebold commented 2 years ago

We've solved this in #584. Check the Development notes section for more detail.

tdhooghe commented 3 weeks ago

Is the actual debugging already in? Otherwise, maybe this issue could be opened?

datajoely commented 3 weeks ago

@tdhooghe could you tell us more about the error you're experiencing?

tdhooghe commented 3 weeks ago

Definitely!

This one: WARNING Layers visualisation is disabled as circular dependency detected among layers.

According to the notes in #584, it was planned to make the debug message here more verbose, and therefore this issue was closed. As this currently is not the case, I was wondering if there are still any plans to include this?

datajoely commented 3 weeks ago

I actually think this is a different issue - it's saying that the layers applied in the catalog create a loop. I.e. Layer 1 -> Layer 2 -> Layer 1.

In truth - the easiest way to debug this is to do kedro viz run --autoreload and spend 10 minutes commenting out layer config and work out the issue.

tdhooghe commented 3 weeks ago

Ah alright, my apologies in that case. I fixed it by logging layer_dependencies and the actual CycleError exception in services/layers.py. I could make a PR for this, if you think it is helpful? At least, it helped me :)

EDIT: and thanks for the debug suggestion btw!

datajoely commented 3 weeks ago

Oh interesting, love that you wen't that deep! We're always keen to get new contributors. Would you mind raising an Issue on this repo so we can understand what's happened in detail.

tdhooghe commented 3 weeks ago

Done! See #2125