kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
646 stars 106 forks source link

Fix Flowchart bug with datasets within a nested modular pipeline. #1863

Closed rashidakanchwala closed 1 month ago

rashidakanchwala commented 2 months ago

Description

Resolves #1814

Development notes

There was a bug in the logic of modular pipelines. Essentially, datasets that served as inputs and outputs to nested modular pipelines (internal_inputs/internal_outputs to modular pipeline) were mistakenly treated as external_inputs/external_outputs to the modular pipeline. This occurred because we were only checking if datasets were internal by comparing them against the nested modular pipeline, neglecting to verify against the parent modular pipeline. Now, we check against both the nested modular pipeline and the parent modular pipeline to determine whether the dataset is either an internal input/output to the modular popular or external input/output.

QA notes

You can verify that the issue is resolved by comparing it to the pipeline example shared in issue #1814.

Additionally, if you examine example number 2 in issue #1651, which includes a nested pipeline, you'll notice that the problem with 'main_pipeline.dataset_1' is fixed. Now, it's hidden inside the 'main_pipeline' node when viewed in collapsed mode.

Checklist

yury-fedotov commented 2 months ago

Thanks for addressing this issue! Hope the example I provided helped localize it. I think it's a great enabler for kedro viz adoption in large projects.

rashidakanchwala commented 2 months ago

@yuryfedotov-mck - We've started addressing the issue you raised, but unfortunately, the solution we implemented doesn't work with deeply nested pipelines. We are doing further investigation to find a fix. Since modular pipelines are quite complex, this may take some time. Please bear with us as we work through it