takikadiri / kedro-boot

A kedro plugin that streamlines the integration between Kedro projects and third-party applications, making it easier for you to develop end-to-end production-ready data science applications.
Apache License 2.0
36 stars 6 forks source link

Aligning template params with kedro configurations management #7

Closed takikadiri closed 9 months ago

takikadiri commented 10 months ago

kedro-boot introduces a new kind of params, we used to call them template params, they are resolved at each iteration time (each kedro boot session run), enabling injection of params from the kedro boot app to the kedro project. Currently, template params uses Jinja expressions that have this pattern [[expression]].

Kedro stopeed using Jinja and adopted OmegaConf for all configurations. In order to align with kedro and reduce the cognitifive effort of the users, we should also try to handle our template params with OmegaConf. Here is a possible way of integrating it:

User interface : kedro named recently the runtime parameters provided through the CLI runtime_params, they are used to overrides values of certain keys in configuration (catalog, parameters, ..). In order to align with the naming convention, we can name our iteration params, itertime_params instead of template_params. Here is an example of a dataset that have an itertime_params in one of its keys:

shuttles:
  type: pandas.SQLQueryDataset
  query: "select shuttle, shuttle_id from spaceflights.shuttles where shuttle_id = ${itertime_params:shuttle_id,1234}"

The signature of the resolver could be : ${itertime_params:<param_name>,<default_value>}

Backend : In the backend, it's a OmegaConf resolver that will be added to the ConfigLoader :

"custom_resolvers": {
    "itertime_params": lambda variable, default_value=None: f"${{oc.select:{variable},{default_value}}}",
}

At kedro booting process, the configuration files would be materialized as python object and all the params would be resolved. At the end of the booting process, we will end up with some datasets that have ${oc.select:{variable},{default_value}} in some of their attribute's values. Thoses values will be lazily resolved at each iteration using iteration params that would be injected by the kedro boot apps.

Related: #6