kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Symlinks are not found when loading configuration files #3972

Open ElenaKhaustova opened 3 days ago

ElenaKhaustova commented 3 days ago

Description

When having parameters in conf/base/sub_folder/parameters_A.yml structure where sub_folder is symlinked, the pipeline gives the following error, however, it is able to read the parameters from the following structure: conf/base/parameters_A.yml

ValueError: Pipeline input(s) {'params:length', 'params:width'} not found in the DataCatalog

Context

The issue is that fsspec.filesystem.glob() which we use to find the paths recursively doesn’t find symlinks.

https://github.com/kedro-org/kedro/blob/adfc593bcd2f1b74676e7ab7c1a3b9c168b7257f/kedro/config/omegaconf_config.py#L295

Steps to Reproduce

For a default spaceflights-pandas project create a symlink folder and place parameters_data_science.yml in the linked folder. Screenshot 2024-06-28 at 14 55 13

Run the pipeline.

Expected Result

Symlinks are found when loading configuration files.

Your Environment