kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
647 stars 106 forks source link

Testing Path Fix #1853

Closed ravi-kumar-pilla closed 2 months ago

ravi-kumar-pilla commented 3 months ago

Description

Testing windows build failure due to StatsHook on main branch.

   WARNING  Unable to get file size for the       hooks.py:142
                             dataset                                           
                             CSVDataset(filepath=C:/Users/circleci             
                             /AppData/Local/Temp/pytest-of-circlec             
                             i/pytest-0/test_get_file_size_dataset             
                             0_0/model_inputs.csv, load_args={},               
                             protocol=file, save_args={'index':                
                             False}): [WinError 3] The system                  
                             cannot find the path specified:                   
                             'C:Users/circleci/AppData/Local/Temp/             
                             pytest-of-circleci/pytest-0/test_get_             
                             file_size_dataset0_0/model_inputs.csv             
                             '

NOTE: While I tested on windows machine locally, it was working fine. However on CircleCI, filepath has C:/, but the system is trying to find C: (missing the forward slash). After looking into plugins kedro-datasets further, some dataset._filepath returns PurePosixPath while some return string. The fix for https://github.com/kedro-org/kedro-viz/issues/1797 should be made on the datasets side to be consistent.

Development notes

The pandas CSV, Excel, Feather, Generic, HDF, json, Parquet, xml has a code block -

 super().__init__(
            filepath=PurePosixPath(path),
            version=version,
            exists_function=self._fs.exists,
            glob_function=self._fs.glob,
        )

while others like deltatable, gbq, sql keep filepath as str. I am not completely aware of the reason for this difference. @SajidAlamQB please add if you have any information on this.

If this is not intentional, we should be fixing the issue on kedro-datasets.

file_path = get_filepath_str(PurePosixPath(dataset._filepath), dataset._protocol)

NOTE: continue_config.yml changes are for testing windows build and they will be discarded before merging to main

QA notes

Checklist

rashidakanchwala commented 2 months ago

also looping in @merelcht , does it make sense to fix this issue on kedro-datasets side? are we strict on return type validation for file_path? pls let us know.

ravi-kumar-pilla commented 2 months ago

also looping in @merelcht , does it make sense to fix this issue on kedro-datasets side? are we strict on return type validation for file_path? pls let us know.

Had a discussion on slack and Nok mentioned that filepath is not a well defined API, means there may be entries of datasets which do not have filepath. Though I still think if there is filepath, the type should be consistent. From Viz perspective, we can have the conversion to be foolproof.

Thank you