Closed ravi-kumar-pilla closed 14 hours ago
Can you explain why it is now necessary to pass the modular_pipelines_repo_obj
of type ModularPipelinesRepository
in some of the methods here? With the refactor it seems like the ModularPipelinesRepository
has a different function than the other repository
classes/objects.
Can you explain why it is now necessary to pass the
modular_pipelines_repo_obj
of typeModularPipelinesRepository
in some of the methods here? With the refactor it seems like theModularPipelinesRepository
has a different function than the otherrepository
classes/objects.
Hi @merelcht , Every node in the flowchart model of Kedro-Viz has a 2 way mapping i.e., modular pipeline repository contains a map - node_mod_pipeline_map
and the node contains a field modular_pipelines
. Earlier we used to have the below logic to gather the modular_pipelines. With the refactor, we are constructing the modular pipelines ahead of creating the nodes and we build the dict - node_mod_pipeline_map
. This dict already contains the details of the modular pipelines the node belongs to. So we are passing the modular_pipelines_repo_obj
while creating the nodes.
@field_validator("modular_pipelines")
@classmethod
def set_modular_pipelines(cls, _, info: ValidationInfo):
return cls._expand_namespaces(info.data["kedro_obj"].namespace)
Thanks @ravi-kumar-pilla this is clearer now. My main concern was whether the flow of data still adheres to the architecture (https://miro.com/app/board/uXjVOktl9TY=/ and https://user-images.githubusercontent.com/2032984/118201082-98dcfb80-b44e-11eb-8e63-d7cc7254d1d1.png) and as far as I can tell it does.
From what I understand: before the refactoring the modular pipelines where populated as part of a call to the REST API. This would in turn call the expand_tree()
method in the services/modular_pipelines.py
.
Now instead the modular pipelines are populated in the same way as the "regular" pipelines, the catalog and dataset stats from inside populate_data()
that happens on the server side when the server is started. In fact this happens at the same time as the regular pipelines are populated. The reason why the ModularPipelinesRepository
object is then used as an argument to subsequent methods in the call stack, is because it's used to determine the nodes + inputs + outputs.
Thanks @ravi-kumar-pilla this is clearer now. My main concern was whether the flow of data still adheres to the architecture (https://miro.com/app/board/uXjVOktl9TY=/ and https://user-images.githubusercontent.com/2032984/118201082-98dcfb80-b44e-11eb-8e63-d7cc7254d1d1.png) and as far as I can tell it does.
Yes @merelcht , the data flow is not changed.
From what I understand: before the refactoring the modular pipelines where populated as part of a call to the REST API. This would in turn call the
expand_tree()
method in theservices/modular_pipelines.py
.
Yes
Now instead the modular pipelines are populated in the same way as the "regular" pipelines, the catalog and dataset stats from inside
populate_data()
that happens on the server side when the server is started. In fact this happens at the same time as the regular pipelines are populated. The reason why theModularPipelinesRepository
object is then used as an argument to subsequent methods in the call stack, is because it's used to determine the nodes + inputs + outputs.
Yes, the ModularPipelinesRepository
object is now passed to the create_node methods to assign the modular_pipelines
property on the GraphNode.
I think this is nearly there. My main suggestion is to make the asserts in the tests more explicit so instead of something like:
assert list(pipelines) == create_list(pipelines)
you do:
assert ["car_pipeline", "plane_pipeline"] == create_list(pipelines)
This explicitness makes the tests easier to read.
Hi @merelcht ,
I understand the edge case test cases are not straightforward to read in the assert statements like you pointed here. I tried to add additional comments to improve readability in the new commit.
Since these tests are drawing the expected json from the expected_tree_for_edge_cases , it will be lot of hardcoding of strings in the test file if I extract the details from the json.
I find this easy to manage in-case there is a change in the modular pipeline tree. Please have a look at the changes in my new commit and let me know if you feel there could be a better way to handle this.
Thank you
I understand the edge case test cases are not straightforward to read in the assert statements like you pointed here. I tried to add additional comments to improve readability in the new commit.
Since these tests are drawing the expected json from the expected_tree_for_edge_cases , it will be lot of hardcoding of strings in the test file if I extract the details from the json.
I find this easy to manage in-case there is a change in the modular pipeline tree. Please have a look at the changes in my new commit and let me know if you feel there could be a better way to handle this.
I think this is a step into the right direction. Personally, I think hardcoding to enhance test readability is better and forces you to check that the code is still correct when you make changes, because you'll have to manually update your tests. However, since I'm not working on this codebase full time I leave the final decision on how much further to improve the tests up to you and the rest of the Viz team.
I think this is a step into the right direction. Personally, I think hardcoding to enhance test readability is better and forces you to check that the code is still correct when you make changes, because you'll have to manually update your tests. However, since I'm not working on this codebase full time I leave the final decision on how much further to improve the tests up to you and the rest of the Viz team.
Thank you @merelcht for taking time and reviewing the PR. Rashida and I had a discussion and she suggested of having one integration test for the edge cases to check the entire api response instead of checking individual properties. I have made the changes in this commit .
@rashidakanchwala , I removed the individual tests here as we already test for example_pipelines
in test_add_pipelines
. Please have a look. Thank you
Description
Partially resolves #1899
Development notes
add_pipeline
method to usepopulate_tree
in the modular_pipelines repository to create a modular pipeline tree using KedroPipelineadd_node_input
andadd_node_output
to take themodular_pipelines_repo_obj
which contains the pre-populated modular pipelines tree (earlier we used to construct the tree at a later stage using expand_tree method of - package/kedro_viz/services/modular_pipelines.py)add_node
to takemodular_pipelines_repo_obj
as inputadd_dataset
to takemodular_pipelines_repo_obj
as inputexpand_tree
service call fromcreate_modular_pipelines_tree_for_registered_pipeline
method as the tree is already constructedQA notes
Checklist
RELEASE.md
file