kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

Pass kwargs to _register_new_resolvers from settings.py #2925

Open Galileo-Galilei opened 11 months ago

Galileo-Galilei commented 11 months ago

Description

Since https://github.com/kedro-org/kedro/pull/2896, it is possible to register custom resolvers for OmegaConfigLoader. I'd like to pass kwargs to the _register_custom_resolvers function to have more control on resolvers behaviour.

Context

I wanted to add a resolver which will behave differently when called several times, e.g.:

#settings.py
import time

CONFIG_LOADER_ARGS={"custom_resolvers" : {"my_resolver": lambda x : time.time()}}

If this resolver is called 2 times in the catalog, it will likely return different results because of a couple of milliseconds delay between each call.

OmegaConf offers fine grained control on this behaviour through the use_cache argument of the register_new_resolver

Possible Implementation

Add an extra key in CONFIG_LOADER_ARGS?

Possible Alternatives

Document that it is possible to register resolver with more control in after_context_created hook, e.g. :

EDIT: you cannot pass arg with the KedroContext._register_new_resolvers method

    def after_context_created(
        self,
        context: KedroContext,
    ) -> None:

        context.config_loader._register_new_resolvers(
            {
                "pa.my_resolver": resolve_yaml_schema,
            }
        )

but you can directly with Omegaconf.register_new_resolver:

    def after_context_created(
        self,
        context: KedroContext,
    ) -> None:

    if not OmegaConf.has_resolver("pa.my_resolver"):
        OmegaConf.register_new_resolver(
            "pa.my_resolver", resolve_yaml_schema, use_cache=True
         )
noklam commented 11 months ago

We had a little discussion about having replace or kwargs for OmegaConf in #2869 and glad you raise this. :) Cc @ankatiyar

@merelcht We also discussed this point. Adding a new replace argument isn't hard, but it clutters the already long argument list.

Two reasons why I think we shouldn't do it now.

  1. If we want to add replace argument, arguably we may need to add a few more since omegaconf allows few extra options. If so should we provide a omegaconf_kwargs instead of just one argument? It's not clear for now.
  2. We can always add a new one, and it wouldn't be a breaking change. However if we decide to add replace as an arguments, but later we want to change it to kwargs, it will be a breaking change and we cannot do it easily.

So for now I think we have a good workaround for advance use case which won't be blocking. If there are more evidence coming in later we can then discuss more about 1. and have a hollistic view of what other arguments we may need to support.

noklam commented 11 months ago

I am in favor of the kwargs over using the internal method _register_new_resolvers. How would you like to use the kwargs to customise settings for each resolver individually?

Galileo-Galilei commented 11 months ago

I did not see the right PR, sorry for that ;)

I did not really think carefully about it and I don't have a strong opinion. I vaguely imagine one of the two following syntax:

Declare kwargs directly in CONFIG_LOADER_ARGS

# settings.py

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        {"name": "add",
         "func": lambda *my_list: sum(my_list),
         "kwargs": {}
        },
        {"name": "today",
         "func": lambda: date_today(),
         "kwargs": {"use_cache": True}
        },
    }
}

but it seems less readable and more error prone that the current:

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x),
        "today": lambda: date_today(),
    }
}

Add an extra constant in CONFIG_LOADER_ARGS

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x),
        "today": lambda: date_today(),
    },
    "custom_resolvers_kwargs": {
        "today": {"use_cache": true},
    }

}
astrojuanlu commented 11 months ago

Alternative: when given a single value the function gets passed, when passed a 2-tuple it's (function, kwargs)

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x),
        "today": (lambda: date_today(), dict(use_cache=True)),
    }
}