omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.99k stars 115 forks source link

Improve Tuple support #392

Open omry opened 4 years ago

omry commented 4 years ago

Currently tuple is treated as a list.

  1. We should preserve the immutability semantics of Tuple.
  2. When converting to a primitive container, we should get a tuple back, currently we get a list:
    In [2]: OmegaConf.to_container(OmegaConf.create((1,2,3)))
    Out[2]: [1, 2, 3]

In addition, it's converted to a mutable sequence:

In [6]: isinstance(OmegaConf.create((1,2,3)), MutableSequence)
Out[6]: True

In [7]: isinstance((1,2,3), MutableSequence)
Out[7]: False

Another thing this might enable is type safe tuple support:

from typing import Tuple, List
from omegaconf import OmegaConf
from dataclasses import dataclass

@dataclass
class Foo:
    t1: Tuple[float, int] = (1.0)
    t2: Tuple[int, ...] = (1, 2, 3)

cfg = OmegaConf.structured(Foo)

A cheap alternative to this is to treat tuple as a List and cut some corners with heterogeneous tuples (List[Any]?).

Finally, OmegaConf.to_container() should return a real tuple for input tuples and not a list.

Fixing it probably means the introduction of a TupleConfig class and is potentially a breaking change in some rare cases. This is probably a very large change and I am not convinced it's worth the investment. Please like if you feel this is important.

omry commented 3 years ago

Probably too big of a change for 2.1 at this point.

gphillips-ema commented 3 years ago

Not sure if this is related, but seems likely.

Given a config, containing a custom resolver as_tuple:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range: ${as_tuple:2,3}
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - vec
       - ${vec}
    - - clf
      - _target_: sklearn.dummy.DummyClassifier

and an associated script:

import hydra
import hydra.utils as hu

from omegaconf import OmegaConf

def resolve_tuple(*args):
    return tuple(args)

OmegaConf.register_new_resolver("as_tuple", resolve_tuple)

@hydra.main(config_name="conf", config_path="conf"
def main(cfg):
    obj = hu.instantiate(cfg.pipe, _convert_='partial')
    print(obj)

Running this script with the associated config raises the following:

$ python test.py
Error executing job with overrides: []
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    main()
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
    config_name=config_name,
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra

    lambda: hydra.run(
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in <lambda>
    overrides=args.overrides,
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "test.py", line 16, in main
    obj = hu.instantiate(cfg.pipe, _convert_='partial')
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 175, in instantiate
    OmegaConf.resolve(config)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 792, in resolve
    omegaconf._impl._resolve(cfg)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
    _resolve_container_value(cfg, k)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
    _resolve(node)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
    _resolve_container_value(cfg, i)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 25, in _resolve_container_value
    _resolve(node)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 44, in _resolve
    _resolve_container_value(cfg, i)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 19, in _resolve_container_value
    _resolve(resolved)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 40, in _resolve
    _resolve_container_value(cfg, k)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/_impl.py", line 23, in _resolve_container_value
    node._set_value(resolved._value())
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 44, in _set_value
    self._val = self.validate_and_convert(value)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 57, in validate_and_convert
    return self._validate_and_convert_impl(value)
  File "/Users/gphillips/anaconda3/envs/ds-core/lib/python3.7/site-packages/omegaconf/nodes.py", line 134, in _validate_and_convert_impl
    f"Value '{t.__name__}' is not a supported primitive type"
omegaconf.errors.UnsupportedValueType: Value 'tuple' is not a supported primitive type

However, if you modify the python file to only instantiate the vectorizer.

...
def main(cfg):
    hu.instantiate(cfg.vec, _convert_='partial')
    print(obj)
...

The target instantiates successfully.

$ python test.py
HashingVectorizer(ngram_range=(2, 3))

Is this the intended behavior? It seems the allow_objects flag for the config is somehow None when instantiating the vec object in the nested instantiation causing the error, but I am not sure how or what to do to mitigate that occuring.

Jasha10 commented 3 years ago

Hello again @gphillips-ema :)

Thanks for sharing this example. I believe you're right that there's a bug here. The issue you are having is actually not tuple-specific; the same problem occurs with any other type that is not supported by OmegaConf (i.e. not a dict/list/primitive).

For your use-case, here is a workaround:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range:
    _target_: builtins.tuple
    _args_:
      - [2, 3]
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - vec
       - ${vec}
    - - clf
      - _target_: sklearn.dummy.DummyClassifier

I've opened an issue against the Hydra repo here: https://github.com/facebookresearch/hydra/issues/1865.

gphillips-ema commented 3 years ago

@Jasha10 thanks for the workaround. I had been doing the following:

vec:
  _target_: sklearn.feature_extraction.text.HashingVectorizer
  ngram_range:
    - 2
    - 3

This at least instantiates, but I am touch uncomfortable as its a list and not a tuple.

My real motivation for this is to simplify overrides for pipelines that can become complex.

Take this example:

pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - transformer
       - _target_: sklearn.compose.ColumnTransformer
         transformers:
           - - text_pipeline
             - _target_: sklearn.pipeline.Pipeline
               steps:
                 - - vectorizer
                   - _target_: sklearn.feature_extraction.text.HashingVectorizer
                     n_features: ${pow:2,18}
                     ngram_range:
                       - 1
                       - 4
             - Description
    - - classifier
      - _target_: sklearn.dummy.DummyClassifier

In this case a user can only override ngram_range min or max individually like so: pipe.steps.0.1.transformers.0.1.steps.0.1.ngram_range.1=5

This could get rather frustrating. But if you can interpolate the vectorizer as follows:

vectorizer:
  - _target_: sklearn.feature_extraction.text.HashingVectorizer
    n_features: ${pow:2,18}
    ngram_range:
      - 1
      - 4
pipe:
  _target_: sklearn.pipeline.Pipeline
  steps:
    - - transformer
       - _target_: sklearn.compose.ColumnTransformer
         transformers:
           - - text_pipeline
             - _target_: sklearn.pipeline.Pipeline
               steps:
                 - - vectorizer
                   - ${vectorizer}
             - Description
    - - classifier
      - _target_: sklearn.dummy.DummyClassifier

The override is now much simpler as the user just needs to add vectorizer.ngram_range.1=5.

Being able to also use a tuple instead of single min or max values is just the icing on the cake here.

This pattern of needing an object to instantiate within a target does seem to pop up a fair amount in our pipelines.

vonHartz commented 1 year ago

Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.

odelalleau commented 1 year ago

Any update or workaround on this? I prefer to use tuples instead of lists in my configs for their immutability.

Unlikely to be added any time soon unless someone feels like writing a big PR for it.

Potential workarounds:

vonHartz commented 1 year ago

Thanks for the quick reply @odelalleau .

To clarify, I use structured configs (dataclasses) to pass the configs around for type checking. I temporarily convert to a DictConfig via dict_config = OmegaConf.structured(config) to merge with cli overwrites (via OmegaConf merge) and then convert back to dataclass via OmegaConf.to_container(dict_config, resolve=True, structured_config_mode=SCMode.INSTANTIATE).

With that workflow, the resolver seems not to do anything. Instead, the conversion to a dict_config = OmegaConf.structured(config) converts the tuples to lists.

odelalleau commented 1 year ago

With that workflow, the resolver seems not to do anything. Instead, the conversion to a dict_config = OmegaConf.structured(config) converts the tuples to lists.

I see -- I'm afraid I'm not entirely sure if there's a way to preserve the tuples in your setup (and no time to look into it closely now).

qsh-zh commented 10 months ago

Looking forward to the feature.

ajugeorge97 commented 3 days ago

Looking forward to this feature.