kedro-org / kedro-devrel

Kedro developer relations team use this for content creation ideation and execution
Apache License 2.0
0 stars 3 forks source link

Create a blog post about HydraConfigLoader #102

Open stichbury opened 11 months ago

stichbury commented 11 months ago

We can share some of the code developed internally and explain the steps needed to create a plugin. With the intent that other teams would find it useful, create their own plugin, and maybe even open-source it for others.

https://github.com/kedro-org/kedro/issues/1303 for background and also on Slack (bookmarked but not shared here as it's an QB/Labs team conversation).

felipemonroy commented 10 months ago

Hi @stichbury. I'm trying to create a config loader capable of handle nested configuration and this will be extremely helpful.

noklam commented 10 months ago

@felipemonroy have you tried OmegaConfigLoader already? What would be the nested configuration looks like?

felipemonroy commented 10 months ago

Hi @noklam. By nested configuration I mean having configurations in cascade that overwrite the previous one if available. For instance I can have a global configuration, which is overwritten with the continent level ones if available, then by the country level, city level etc. My first approach was with separate envs but I end up writting the same several times and I would need one env per each combination. The second approach was to use hydra because it allows having duplicated configuration in separete folders.

I was exploring OmegaConf and custom resolvers and I may end up with a parcial solution:

from typing import Any

from omegaconf import OmegaConf, Container
from omegaconf._impl import select_node, _get_value

def coalesce(key: str, *defaults, _parent_: Container) -> Any:
    keys = [key] + list(defaults)

    for key in keys:
        node = select_node(
            cfg=_parent_,
            key=key,
            throw_on_resolution_failure=True,
            throw_on_missing=False,
            absolute_key=False,
        )
        if node is None or node._is_missing():
            next
        else:
            return _get_value(node)

    return None

OmegaConf.register_new_resolver("oc.coalesce", coalesce)

cfg = OmegaConf.create(
    {
        "training_params": "${oc.coalesce:training_params_${continent}_${country}, training_params_${continent}, training_params_global}",
        "training_params_global": {"type": "global"},
        "training_params_europe": {"type": "europe"},
        "training_params_europe_france": {"type": "europe_france"},
        "training_params_america": {"type": "america"},
        "country": "chile",
        "continent": "america",
    }
)

print(cfg.training_params)

So in this example I defined a global configuration, then a continent level one and finally a country level (here I dont have the same country in two continents but in my use case I have that situation, that is the reason I have to concatenate continent and country). If the continent-country configuration is available it will use it, if not, it will try with the continent level and finally the global one.

Could you please tell me if your HydraConfigLoader works better in this use case?

noklam commented 10 months ago

@felipemonroy Thanks for sharing this. HydraConfigLoader is an implementation from the community and unfortunately I don't know how it works exactly.

I was referring to OmegaConfigLoader https://docs.kedro.org/en/stable/kedro.config.OmegaConfigLoader.html instead of a new implementation with OmegaConf. It can probably achieve similar things but not with the directory hierarchy (which can be good or bad depends on your use case).

Otherwise a combination of