kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
679 stars 113 forks source link

Spike: Try to reproduce non-deterministic rendering #2057

Closed astrojuanlu closed 2 days ago

astrojuanlu commented 2 months ago

Description

I'm always frustrated when I do kedro viz run twice and each time the rendered DAG is different.

Context

I was reviewing #1966 and I wanted to see if the rendered pipeline is the same, but it's difficult because the sorting of the nodes can differ.

Checklist

ravi-kumar-pilla commented 2 months ago

This will also enable having autoreload of Kedro-Viz in VSCode extension.

astrojuanlu commented 2 months ago

A bit of history: as far as I understand, this is where the algorithm was introduced https://github.com/kedro-org/kedro-viz/pull/185

astrojuanlu commented 2 months ago

And I don't see any references to randomness in the original paper that defines the Cassowary algorithm https://doi.org/10.1145/504704.504705 and no mention in the kiwi.js repository either https://github.com/IjzerenHein/kiwi.js/ so my guess is that the randomness actually comes from Kedro Viz? 🤔

noklam commented 1 week ago

Is there an issue with reproducing this? I thought this is always the case. I just do kedro viz on an example project and it resulted in different layout:

image

image
rashidakanchwala commented 1 week ago

And I don't see any references to randomness in the original paper that defines the Cassowary algorithm https://doi.org/10.1145/504704.504705 and no mention in the kiwi.js repository either https://github.com/IjzerenHein/kiwi.js/ so my guess is that the randomness actually comes from Kedro Viz? 🤔

Thanks, @astrojuanlu, for your help with the above. This clarified that the issue was with Kedro-Viz. In the graph, we pass nodes, edges, and layers. I noticed that while the ordering of nodes and layers remained consistent, the edges’ order varied with each time the graph was calculated. I’ve added a sortEdges function to reduce randomness by ensuring we pass the same edges dictionary each time. It seems to be working now.

I’ll open a PR so others can test it.