kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
647 stars 106 forks source link

Add functionality to optionally collapse free outputs of namespace pipelines #1817

Open yury-fedotov opened 3 months ago

yury-fedotov commented 3 months ago

Description

Developing a large Kedro project that heavily relies on modular namespace pipelines, I encountered a "limitation" of viz that makes it harder to see the project structure in the viz if there are many namespace pipelines involved.

I believe this is something that is:

So decided to open this feature request. Happy to chat in the comments.

Context

Imagine I have a modular profiling pipeline that takes a dataframe and generates many objects showcasing its properties. For example:

Then imagine I have 5 tables in the project that I need to profile. Leveraging that modular pipeline and namespaces allows to achieve this super easily, and generate instances of Pipeline in a for loop. However, since all those artifacts are free outputs of the namespace, kedro viz representation of this becomes incredibly messy due to tens (or potentially hundreds) of objects exposed to the top level of viz.

If my profiling pipeline creates 5 artifacts, and I am profiling 5 tables with it, there are 25 objects exposed to the top level of viz, which negatively affects readability of the graph.

Possible Implementation

I wasn't able to come up with exact tech proposal, but directionally, I think it can be something like this. We can have a yaml file in the Kedro project config called viz_config.yaml that may have the following additional content:

collape_namespaces_outputs:
    - profiling
    - another_namespace_outputs_of_which_should_be_collapsed_unless_used_outside_of_this_namespace

And what that would do is it would enforce viz to collapse even the free outputs of the namespace, unless they are not used anywhere else, inside the supernode.

Checklist