kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.9k stars 899 forks source link

Omegaconf breaks with a custom resolver providing a datetime #3620

Open ankatiyar opened 7 months ago

ankatiyar commented 7 months ago

Description

This case came up in a user question (@PetitLepton) - https://linen-slack.kedro.org/t/16417334/hi-folks-i-encountered-an-unexpected-behavior-regarding-the-#8d5f69d7-25ce-409f-b009-0a6a50cd064c

Context

The user has a custom resolver in their config which converts a string to a pandas.Timestamp type -

Third-party

import pandas

def parse_timezone_aware_datetime_string(datetime_string: str) -> pandas.Timestamp: timestamp = pandas.Timestamp(datetime_string).astimezone(tz=timezone.utc) return timestamp

Keyword arguments to pass to the CONFIG_LOADER_CLASS constructor.

CONFIG_LOADER_ARGS = { "custom_resolvers": { "parse_timezone_aware_datetime_string": parse_timezone_aware_datetime_string, }, "base_env": "base", "default_run_env": "local", } }

* `parameters.yml`
```yaml
start: ${parse_timezone_aware_datetime_string:2023-12-13T23:00:00+01:00}
test: "${runtime_params:the_test,2}"

This works fine when you do a kedro run without any run time parameters but fails with the following error when you do a kedro run --params="the_test=1":

omegaconf.errors.UnsupportedValueType: Value 'datetime' is not a supported primitive type
    full_key: start
    object_type=dict

This is because when there's run time params, there is an additional merge step - https://github.com/kedro-org/kedro/blob/80ad182a220d691c77efc05cc4e2b36dcecb6ac7/kedro/framework/context/context.py#L203-L205 This happens after the parameter with the custom resolver has already been resolved into the pandas.Timestamp type and omegaconf apparently does not support non primitive types in the config.

Possible Implementation

Possible Alternatives

noklam commented 5 months ago

Should this get moved to discussion or is it ready to be discussed? @ankatiyar

astrojuanlu commented 5 months ago

This hasn't been discussed yet, it's in the Inbox

noklam commented 4 months ago

https://github.com/kedro-org/kedro/commit/84b3e5c960ca4308cf95980c809b66db140045f7

This was the original PR that creates this regression. The PR was about removing custom syntax, there is a minor refactoring that use OmegaConf to remove the custom nest dict update function. This is the root cause of the bug.

Things to do IMO:

Another way to confirm this is to reproduce the error and run this in older version (maybe 0.18.14)

PetitLepton commented 1 month ago

I am coming back to this as I currently cannot bump our code from 0.18.14 to 0.19.8. A full minimal example is at https://github.com/PetitLepton/kedro-runtime-params.

noklam commented 1 month ago

@PetitLepton thanks for bumping this, I think we have a clear solution here already, which is do not use Omegaconf in KedroContext. I have commented that we should not do this , the simple solution here is to revert to a recursively dict update.