kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.79k stars 897 forks source link

Better Credentials Support in Catalog #3634

Open lordsoffallen opened 6 months ago

lordsoffallen commented 6 months ago

Description

I would to refer this block in my catalog:

dataset#hf:
  type: hffinder.extras.hf.HFDataset
  filepath: data/01_raw/datasets/
  dataset_name: test/huggingface-datasets
  credentials: huggingface

This works when I have huggingface defined in the credentials.yml. I would like an option to make this optional and return None when that doesn't exist. Right now code fails if huggingface is not defined. I tried to play with oc.select but refers to existing keys and I think runs before the injection of credentials.

Context

Possible Implementation

Possible Alternatives

lordsoffallen commented 6 months ago

Current workaroung is what I put into settings.py file:

from unittest.mock import patch, Mock

patcher = patch(
    "kedro.io.data_catalog._get_credentials",
    Mock(side_effect=lambda name, credentials: credentials.get(name))
)
patcher.start()

I don't like mocking stuff but this function wasn't part of the catalog class so extending catalog class and overriding it wasn't possible or too much code was required.