omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.97k stars 111 forks source link

Interpolations in dictionary keys #1024

Open colobas opened 2 years ago

colobas commented 2 years ago

Is your feature request related to a problem? Please describe. In situations where you have several related dicts with the same keys, it would be nice to be able to define those only once somewhere an interpolate them into all the dictionary keys, to avoid errors and repetition. It would also allow one to override the corresponding keys in all the dicts at once.

Describe the solution you'd like Being able to to something like:

model:
  x_label: image
  y_label: class
  id_column: index
  network:
    _target_: my_pkg.my_network.Network

data:
  ${model.id_column}:
    _target_: my_pkg.load_col_from_dataframe    # some custom function
    _partial_: true
    col: ${model.id_column}
  ${model.y_label}:
    _target_: my_pkg.load_col_from_dataframe
    _partial_: true
    col: ${model.y_label}
  ${model.x_label}:
    _target_: my_pkg.load_image_from_dataframe
    _partial_: true
    path_col: ${model.x_label} 

(This is a simplified version of what I do in serotiny

Describe alternatives you've considered No really good alternatives other than manually making sure the values match.

Additional context I would use this feature heavily in the context of a project I'm working on: https://github.com/allencell/serotiny

Jasha10 commented 2 years ago

Thanks for the feature request, @colobas.

This feature would likely require big changes to OmegaConf's interpolation machinery, so I don't think we can make this a high priority for the short term.

Jasha10 commented 2 years ago

An alternative pattern that might fit your use-case is to use a list instead of a dict:

model:
  x_label: image
  y_label: class
  id_column: index
  network:
    _target_: my_pkg.my_network.Network

data:
  - key: ${model.id_column}
    value:
      _target_: my_pkg.load_col_from_dataframe    # some custom function
      _partial_: true
      col: ${model.id_column}
  - key: ${model.y_label}
    value:
      _target_: my_pkg.load_col_from_dataframe
      _partial_: true
      col: ${model.y_label}
  - key: ${model.x_label}
    value:
      _target_: my_pkg.load_image_from_dataframe
      _partial_: true
      path_col: ${model.x_label} 

You could then use python code (or a custom resolver) to convert the data list into a dict with the given keys and values.

colobas commented 2 years ago

Yeah I considered something like you're proposing, but it doesn't apply to a scenario where the data dictionary is used to instantiate a class (i.e. its keys are kwargs).

I didn't expect this to be an easy change and it isn't blocking, so no worries.

Keep up the great work!

Jasha10 commented 2 years ago

Just to follow up on the idea of using a custom resolver, here's an implementation that converts the ListConfig into a DictConfig:

# kv_to_dict.py
from omegaconf import DictConfig, ListConfig, OmegaConf

yaml_data = """
model:
  x_label: image
  y_label: class
  id_column: index
  network:
    _target_: my_pkg.my_network.Network

_data_keys_and_values:
  - key: ${model.id_column}
    value:
      _target_: my_pkg.load_col_from_dataframe    # some custom function
      _partial_: true
      col: ${model.id_column}
  - key: ${model.y_label}
    value:
      _target_: my_pkg.load_col_from_dataframe
      _partial_: true
      col: ${model.y_label}
  - key: ${model.x_label}
    value:
      _target_: my_pkg.load_image_from_dataframe
      _partial_: true
      path_col: ${model.x_label} 

data: ${kv_to_dict:${_data_keys_and_values}}
"""

def kv_to_dict(kv: ListConfig) -> DictConfig:
    assert isinstance(kv, ListConfig)
    ret = {}
    for item in kv:
        key = item["key"]
        value = item["value"]
        ret[key] = value
    return OmegaConf.create(ret)

OmegaConf.register_new_resolver("kv_to_dict", kv_to_dict)

cfg = OmegaConf.create(yaml_data)

print(OmegaConf.to_yaml(cfg.data))
$ python kv_to_dict.py
index:
  _target_: my_pkg.load_col_from_dataframe
  _partial_: true
  col: ${model.id_column}
class:
  _target_: my_pkg.load_col_from_dataframe
  _partial_: true
  col: ${model.y_label}
image:
  _target_: my_pkg.load_image_from_dataframe
  _partial_: true
  path_col: ${model.x_label}

And here's an alternative implementation using a list-of-lists:

# kv_to_dict2.py
from omegaconf import DictConfig, ListConfig, OmegaConf

yaml_data = """
model:
  x_label: image
  y_label: class
  id_column: index
  network:
    _target_: my_pkg.my_network.Network

_data_keys_and_values:
  - - ${model.id_column}
    - _target_: my_pkg.load_col_from_dataframe    # some custom function
      _partial_: true
      col: ${model.id_column}
  - - ${model.y_label}
    - _target_: my_pkg.load_col_from_dataframe
      _partial_: true
      col: ${model.y_label}
  - - ${model.x_label}
    - _target_: my_pkg.load_image_from_dataframe
      _partial_: true
      path_col: ${model.x_label} 

data: ${kv_to_dict:${_data_keys_and_values}}
"""

def kv_to_dict(kv: ListConfig) -> DictConfig:
    ret = dict(kv)  # convert list-of-lists into dict
    return OmegaConf.create(ret)

OmegaConf.register_new_resolver("kv_to_dict", kv_to_dict)

cfg = OmegaConf.create(yaml_data)

print(OmegaConf.to_yaml(cfg.data))
colobas commented 2 years ago

Oh I totally skimmed over that part. That's a great idea, thanks so much!