kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
671 stars 110 forks source link

Visualise `Pipeline` objects #1993

Open astrojuanlu opened 2 months ago

astrojuanlu commented 2 months ago

AS A Kedro user I WANT TO visualise Pipeline objects directly in notebooks SO THAT

  1. I don't need the full Kedro Framework structure (a requirement for %run_viz)
  2. I can interactively visualise Pipeline objects while I am creating them

Originally #1459, extra context in https://github.com/kedro-org/kedro-viz/discussions/1833#discussioncomment-9949391 reproduced below:

I am showcasing Kedro concepts on a notebook without creating a full-fledged project. Took https://github.com/ibis-project/kedro-ibis-tutorial/blob/main/03%20-%20First%20Steps%20with%20Kedro.ipynb as inspiration, and adapted it to Spark and Databricks (will try to publish that soon).

However, since there is no Kedro Framework project, there is no way I can visualise my pipelines, even though I have a Pipeline object perfectly defined:

image

It would be insanely awesome if I could do KedroViz().visualize(pipe).show() or something like that, without ever needing to set-up a Kedro project.

yury-fedotov commented 2 months ago

@astrojuanlu interesting use case. Have you seen a lot that users define pipelines in notebooks or import them to there?

I thought vast majority of notebook usage is to do catalog.load("something") and then some EDA. While all pipeline definition is in .py files.

astrojuanlu commented 2 months ago

Have you seen a lot that users define pipelines in notebooks

I have not, and probably the reason is that traditionally Kedro had taken sort of an anti-notebook stance. We evolved that in 2023, for example by writing https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html

I've personally found it very handy to explain things to data scientists with notebooks when teaching. See for example https://github.com/ibis-project/kedro-ibis-tutorial/blob/main/03%20-%20First%20Steps%20with%20Kedro.ipynb, recording (very well received) or https://github.com/astrojuanlu/kedro-databricks-demo/blob/main/First%20Steps%20with%20Kedro%20on%20Databricks.ipynb (essentially the same thing, but with a ManagedTableDataset connecting to DBX UC). Being able to visualise the pipelines there directly would be awesome I think.

or import them to there?

We launched a feature earlier this year to do something like that https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html#load-node-line-magic it's for nodes rather than full pipelines though.

I thought vast majority of notebook usage is to do catalog.load("something") and then some EDA.

That's our impression too yes (and in fact I do that all the time). So this issue would be about taking that one little step further.

astrojuanlu commented 3 weeks ago

A user just asked about this.

astrojuanlu commented 3 weeks ago

(And it had nothing to do with notebooks)

KikiCS commented 3 weeks ago

Hello, I add some context for my use-case after sending a message on Slack. Kedro viz diagrams are very useful for non-technical people wanting to get a high-level view of the data pipeline. While documenting models in my company internal Notion, I thought including a kedro viz diagram would be super useful, as well as generating a new one every time a change to the pipeline is released. I got the idea when I saw that Notion shows diagrams written in Mermaid, but I don't know and haven't checked if kedro viz is based on Mermaid under the hood.