kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.03k stars 906 forks source link

Move credentials resolver to the config component #4259

Open ElenaKhaustova opened 1 month ago

ElenaKhaustova commented 1 month ago

Description

Currently, we pass credentials to the catalog separately from the rest configuration: https://github.com/kedro-org/kedro/blob/a5d9bb40380c598bf7d03cb16623026892844ed4/kedro/framework/context/context.py#L236

Then we resolve credentials at the CatalogConfigResolver level: https://github.com/kedro-org/kedro/blob/a5d9bb40380c598bf7d03cb16623026892844ed4/kedro/io/catalog_config_resolver.py#L33

We suggest to move this resolution to the config component, then:

  1. We simplify CatalogConfigResolver, so it only resolves patterns and there's no confusion when using "resolve" term;
  2. All the configuration resolution will be handled in one place by config module

Context

This should be done as a preceding step to https://github.com/kedro-org/kedro/issues/3811

datajoely commented 1 month ago

Such a welcome move

datajoely commented 1 week ago

Having just hit this in my current project, having something that was a more fully featured version of this would be a really welcome quality of life improvement.


def _get_local_creds(top_level_key: str, key: str) -> str:
    cred_data = OmegaConf.load("conf/local/credentials.yml")
    return cred_data[top_level_key][key]

CONFIG_LOADER_ARGS = {
    "base_env": "base",
    "default_run_env": "local",
    "custom_resolvers": {"creds": _get_local_creds},
}

In turn the YAML looks like this:

synthea_patients:
  type: ibis.TableDataset
  table_name: patients
  connection: 
    backend: postgres
    database: synthea
    user: "${creds:postgres, user}"
    password: "${creds:postgres, password}"

All in all it feels like the oc.env workflow but against our locals/credentials.yml which is a core part of the project template - the fundamental difference is that it takes the onus of integrating from Kedro's archaic credentials mechanism (from v0.0.1) away from the dataset implementation and developer.

datajoely commented 1 week ago

Little hack to make my MVP resolver mask credentials in the __repr__

image