Developing a large Kedro project that heavily relies on modular namespace pipelines, I encountered a "limitation" of viz that makes it harder to see the project structure in the viz if there are many namespace pipelines involved.
I believe this is something that is:
Technically feasible to address
Relevant for all big projects and users of namespace pipelines
So decided to open this feature request. Happy to chat in the comments.
Context
Imagine I have a modular profiling pipeline that takes a dataframe and generates many objects showcasing its properties. For example:
A table of missing values by column and by group
Pairplot - a collection of scatterplots of all columns versus each other
Correlation heatmap
...
Then imagine I have 5 tables in the project that I need to profile. Leveraging that modular pipeline and namespaces allows to achieve this super easily, and generate instances of Pipeline in a for loop. However, since all those artifacts are free outputs of the namespace, kedro viz representation of this becomes incredibly messy due to tens (or potentially hundreds) of objects exposed to the top level of viz.
If my profiling pipeline creates 5 artifacts, and I am profiling 5 tables with it, there are 25 objects exposed to the top level of viz, which negatively affects readability of the graph.
Possible Implementation
I wasn't able to come up with exact tech proposal, but directionally, I think it can be something like this.
We can have a yaml file in the Kedro project config called viz_config.yaml that may have the following additional content:
And what that would do is it would enforce viz to collapse even the free outputs of the namespace, unless they are not used anywhere else, inside the supernode.
Checklist
[x] Include labels so that we can categorise your feature request
Description
Developing a large Kedro project that heavily relies on modular namespace pipelines, I encountered a "limitation" of
viz
that makes it harder to see the project structure in theviz
if there are many namespace pipelines involved.I believe this is something that is:
So decided to open this feature request. Happy to chat in the comments.
Context
Imagine I have a modular
profiling
pipeline that takes a dataframe and generates many objects showcasing its properties. For example:Then imagine I have 5 tables in the project that I need to profile. Leveraging that modular pipeline and namespaces allows to achieve this super easily, and generate instances of
Pipeline
in afor
loop. However, since all those artifacts are free outputs of the namespace,kedro viz
representation of this becomes incredibly messy due to tens (or potentially hundreds) of objects exposed to the top level of viz.Possible Implementation
I wasn't able to come up with exact tech proposal, but directionally, I think it can be something like this. We can have a
yaml
file in the Kedro project config calledviz_config.yaml
that may have the following additional content:And what that would do is it would enforce viz to collapse even the free outputs of the namespace, unless they are not used anywhere else, inside the supernode.
Checklist