Current issues in constructing the modular pipeline tree:
How we determine internal_inputs/outputs, external_inputs/outputs based on namespace and not on what kedro returns. Since datasets do not have a namespace (i.e., only kedro node and pipeline have namespaces) this raised issues in determining the actual inputs/outputs of a nested modular pipeline.
Inheriting input/output datasets to parent modular pipeline when nested. This made few datasets to appear in the root modular pipeline even though they are not free output datasets.
Readability/Maintenance issues in case of nested modular pipelines, as we did not define rules in adding a modular pipeline child, inputs and outputs for a modular pipeline
On the UI, modular pipeline focus was missing associated inputs/outputs from getting highlighted in the node menu as dataset nodes do not have namespace, the associated modular_pipelines were always empty.
Determines inputs/outputs to a modular pipeline based on what kedro returns.
Removes the concept of internal/external inputs/outputs datasets for modular pipelines. There are only inputs/outputs for a modular pipeline. (Thanks to @idanov)
Creates helper functions with rules, to deal with adding inputs/outputs and children to a modular pipeline.
Populates modular pipeline tree before creating task/data nodes, which eliminates the need to calculate modular pipelines while creating the nodes using namespaces
Core parts that changed:
Added helper methods populate_tree, add_children, _add_datasets_as_children, _add_children_to_parent_pipeline to ModularPipelinesRepository. (Thanks to @rashidakanchwala)
While adding each KedroPipeline to Kedro-Viz data repositories, DataAccessManager calls populate_tree to resolve the construction of modular_pipelines_tree for the registered pipeline
Inputs/Outputs for a modular pipeline are calculated using public apis available via Kedro (inputs(), outputs(), all_outputs(), only_nodes_with_namespace())
Calculating children now have set of rules defined in the docstrings of add_children and other helper functions
UseCase 3: When a nested modular pipeline output (dataset_3) is used as an input to the outer modular pipeline and also used as an input to another external modular pipeline
Description
Resolves #1899 , #1814
Development notes
To ease review process for - https://github.com/kedro-org/kedro-viz/pull/1897 , created the below PRs
QA notes
Example modular pipeline tree:
Current issues in constructing the modular pipeline tree:
Incorrect rendering of nodes :
Issues raised by users -
How does this PR resolve the issues:
Core parts that changed:
populate_tree
,add_children
,_add_datasets_as_children
,_add_children_to_parent_pipeline
to ModularPipelinesRepository. (Thanks to @rashidakanchwala)populate_tree
to resolve the construction ofmodular_pipelines_tree
for the registered pipelineadd_children
and other helper functionsCode Flow doc:
Please find further information at Refactor_Modular_Pipelines.docx
Modular Pipelines UI Rendering:
UseCase 1: When a modular pipeline output (dataset_3) is used as an input to another function of the same modular pipeline.
Before:
After:
UseCase 2: When a nested modular pipeline output (dataset_3) is used as an input to the outer modular pipeline
Before:
After:
UseCase 3: When a nested modular pipeline output (dataset_3) is used as an input to the outer modular pipeline and also used as an input to another external modular pipeline
Before:
After:
UseCase 4: When an output of a namespace function (using node namespaces) (dataset_7, dataset_9) is an input to another function in the same namespace
Before:
After:
UseCase 5: When an output of a nested modular pipeline (model_inputs) is an input to another nested modular pipeline
Before:
After:
UseCase 6: Nested namespace pipelines with single input (input_to_processing) and single output (output_from_processing)
Before:
After:
Modular Pipelines expand and collapse in action:
Before:
UseCase 1-4:
UseCase 5-6:
After:
UseCase 1-4:
UseCase 5-6:
Checklist
RELEASE.md
file