kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
646 stars 106 forks source link

added "-p" cli option to match `kedro` pipeline options #1961

Open VladCozma opened 4 days ago

VladCozma commented 4 days ago

Signed-off-by: Vlad Cozma vlad@cozma.online

Description

CLI pipeline option for kedro viz is different than the option for kedro run.

Development notes

Added the -p option for kedro viz along --pipeline option.

QA notes

Checklist

ravi-kumar-pilla commented 4 days ago

Hi @VladCozma ,

Thank you for the PR. Could you please add a test parameter with the new option at https://github.com/kedro-org/kedro-viz/blob/main/package/tests/test_launchers/test_cli.py#L183 ?

Thank you

VladCozma commented 4 days ago

Hi @VladCozma ,

Thank you for the PR. Could you please add a test parameter with the new option at https://github.com/kedro-org/kedro-viz/blob/main/package/tests/test_launchers/test_cli.py#L183 ?

Thank you

Done!

astrojuanlu commented 4 days ago

Hi @VladCozma, thanks for your PR! What would kedro viz -p "Data ingestion" do? And also, my understanding is that this wouldn't be 100 % consistent because in Kedro it would be kedro run -p data_ingestion (no "beautiful name")

VladCozma commented 2 days ago

Hi @astrojuanlu, it is consistent with the definition of the demo-project. The pipelines registered in settings.py are these:

    return {
        "__default__": (
            ingestion_pipeline
            + feature_pipeline
            + modelling_pipeline
            + reporting_pipeline
        ),
        "Data ingestion": ingestion_pipeline,
        "Modelling stage": modelling_pipeline,
        "Feature engineering": feature_pipeline,
        "Reporting stage": reporting_pipeline,
        "Pre-modelling": ingestion_pipeline + feature_pipeline,
    }

Here is the execution of kedro run and kedro viz:

~kedro run -p "Data ingestion"
[06/30/24 12:05:16] INFO     Kedro project demo-project                                                                     session.py:324
[06/30/24 12:05:17] INFO     Using synchronous mode for loading and saving data. Use the --async flag for          sequential_runner.py:64
                             potential performance gains.                                                                                 
                             https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#load-and-sav                        
                             e-asynchronously                                                                                             
                    INFO     Loading data from companies (CSVDataset)...                                               data_catalog.py:508
                    INFO     Running node: apply_types_to_companies: apply_types_to_companies([companies]) ->                  node.py:361
                             [ingestion.int_typed_companies]                                                                              
                    INFO     Saving data to ingestion.int_typed_companies (ParquetDataset)...                          data_catalog.py:550
                    INFO     Completed 1 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Loading data from reviews (CSVDataset)...                                                 data_catalog.py:508
[06/30/24 12:05:18] INFO     Loading data from params:ingestion.typing.reviews.columns_as_floats (MemoryDataset)...    data_catalog.py:508
                    INFO     Running node: apply_types_to_reviews:                                                             node.py:361
                             apply_types_to_reviews([reviews;params:ingestion.typing.reviews.columns_as_floats]) ->                       
                             [ingestion.int_typed_reviews]                                                                                
                    INFO     Saving data to ingestion.int_typed_reviews (ParquetDataset)...                            data_catalog.py:550
                    INFO     Completed 2 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Loading data from shuttles (ExcelDataset)...                                              data_catalog.py:508
[06/30/24 12:05:23] INFO     Running node: apply_types_to_shuttles: apply_types_to_shuttles([shuttles]) ->                     node.py:361
                             [ingestion.int_typed_shuttles@pandas1]                                                                       
                    INFO     Saving data to ingestion.int_typed_shuttles@pandas1 (ParquetDataset)...                   data_catalog.py:550
                    INFO     Completed 3 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Loading data from ingestion.int_typed_companies (ParquetDataset)...                       data_catalog.py:508
                    INFO     Running node: company_agg: aggregate_company_data([ingestion.int_typed_companies]) ->             node.py:361
                             [ingestion.prm_agg_companies]                                                                                
[06/30/24 12:05:24] INFO     Saving data to ingestion.prm_agg_companies (MemoryDataset)...                             data_catalog.py:550
                    INFO     Completed 4 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Loading data from ingestion.int_typed_shuttles@pandas2 (ParquetDataset)...                data_catalog.py:508
                    INFO     Loading data from ingestion.prm_agg_companies (MemoryDataset)...                          data_catalog.py:508
                    INFO     Loading data from ingestion.int_typed_reviews (ParquetDataset)...                         data_catalog.py:508
                    INFO     Running node: combine_step:                                                                       node.py:361
                             combine_shuttle_level_information([ingestion.int_typed_shuttles@pandas2;ingestion.prm_agg_compani            
                             es;ingestion.int_typed_reviews]) -> [prm_shuttle_company_reviews;prm_spine_table]                            
                    INFO     Saving data to prm_shuttle_company_reviews (ParquetDataset)...                            data_catalog.py:550
                    INFO     Saving data to prm_spine_table (ParquetDataset)...                                        data_catalog.py:550
                    INFO     Completed 5 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Loading data from prm_spine_table (ParquetDataset)...                                     data_catalog.py:508
                    INFO     Running node: <lambda>([prm_spine_table]) -> [ingestion.prm_spine_table_clone]                    node.py:361
                    INFO     Saving data to ingestion.prm_spine_table_clone (MemoryDataset)...                         data_catalog.py:550
                    INFO     Completed 6 out of 6 tasks                                                            sequential_runner.py:90
                    INFO     Pipeline execution completed successfully.                                                      runner.py:119
                    INFO     Loading data from ingestion.prm_spine_table_clone (MemoryDataset)...                      data_catalog.py:508
~kedro viz -p "Data ingestion"      
Starting Kedro Viz ...
Kedro Viz started successfully. 

✨ Kedro Viz is running at 
 http://127.0.0.1:4141/

kedro run -p data_ingestion will throw an error:

~kedro run -p ingestion_pipeline
[06/30/24 12:09:24] INFO     Kedro project demo-project                                                                     
[...]
    raise ValueError(
ValueError: Failed to find the pipeline named 'ingestion_pipeline'. It needs to be generated and returned by the 'register_pipelines' function.

Cheers

rashidakanchwala commented 12 hours ago

FYI - While testing your PR, I accidentally included some of your changes in cli.py file in my own PR, which has now been merged. As a result, your PR no longer shows those changes in cli.py when compared to the main branch.