omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.91k stars 104 forks source link

Simplify creation of merged and structured configs #1096

Closed klamann closed 1 year ago

klamann commented 1 year ago

Merging configs and converting untyped configs to structured configs currently takes several function calls. Also, it is not trivial to decide, which functions to use (e.g. merge vs unsafe_merge) and you can shoot yourself in the foot if you apply structured config conversion and merging in the wrong order.

Let's say I have two configs: a base config and another config that extends the base config for a different staging environment:

yml_base = """
base_dir: /tmp
csv_file: ${base_dir}/users.csv
"""
yml_prod = """
base_dir: /opt
"""

Let's turn this into a DictConfig:

conf_base = OmegaConf.create(yml_base)
conf_prod_only = OmegaConf.create(yml_prod)
conf_prod = OmegaConf.merge(conf_base, conf_prod_only)

Three function calls, pretty straightforward.

Now let's convert this to a structured config

@dataclass
class MyConfig:
    base_dir: str
    csv_file: str

here we first have to do the DictConfig conversion from before, then specify MyConfig in the merge function and finally call to_object:

conf_base = OmegaConf.create(yml_base)
conf_prod_only = OmegaConf.create(yml_prod)
typed_conf = OmegaConf.merge(conf_base, conf_prod_only, MyConfig)
class_conf: MyConfig = OmegaConf.to_object(typed_conf)

now we have 4 function calls in a row, and class_conf doesn't even have type information, so we have to specify the type hint for MyConfig ourselves. Also, if we would have resolved each DictConfig first and then merged it, the value of csv_file would have been /tmp/users.csv instead of /opt/users.csv. And we could have used unsafe_merge here for a slight performance boost.

This is not trivial to remember, so whenever I use OmegaConf, I carry a bunch of utility functions around. But I think we can slightly extend some OmegaConf functions to make this easier for everyone. This is how I would like to use the OmegaConf.create function

def create(
    *obj: Any,
    parent: Optional[BaseContainer] = None,
    flags: Optional[Dict[str, bool]] = None,
) -> DictConfig:
    if len(obj) == 0:
        return OmegaConf.create(parent=parent, flags=flags)
    elif len(obj) == 1:
        return OmegaConf.create(obj[0], parent=parent, flags=flags)
    else:
        configs = [OmegaConf.create(o, parent=parent, flags=flags) for o in obj]
        return OmegaConf.unsafe_merge(*configs)

then we could just call OmegaConf.create(yml_base, yml_prod) to get the merged config.

To directly create a structured config:

T = TypeVar('T')

def create_structured(
    type_class: Type[T],
    *obj: Any,
    parent: Optional[BaseContainer] = None,
    flags: Optional[Dict[str, bool]] = None,
) -> T:
    # here we call the proposed new create function that merges all provided configs
    dict_conf = OmegaConf.create(*obj, parent=parent, flags=flags)
    typed_conf = OmegaConf.unsafe_merge(dict_conf, type_class)
    return OmegaConf.to_object(typed_conf)

now we can call OmegaConf.create_structured(MyConfig, yml_base, yml_prod) to get the merged config as instance of MyConfig, and we even get the correct type information.

We could extend OmegaConf.load in the same way and add OmegaConf.load_structured so we can load structured configs directly from files.

What do you think?

Jasha10 commented 1 year ago

Hello @klamann, thank you for sharing your thoughts, and I'm glad you're finding OmegaConf useful!

A few notes:

Your custom wrappers around OmegaConf.create look reasonable.

Note that the OmegaConf API facilitates working directly with config objects (DictConfig and ListConfig) and manipulating the data stored inside of them, but it does not address use cases such as higher-level orchestration of config composition.

Have you seen hydra? Hydra uses OmegaConf as a backend and is designed to handle some of the complexity in orchestrating config composition.