omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.91k stars 104 forks source link

Add `is_unresolved` to OmegaConf #1081

Closed kunaltyagi closed 1 year ago

kunaltyagi commented 1 year ago

Is your feature request related to a problem? Please describe. I'd like to know when the value would be an indirect call and when it's a direct input

Describe the solution you'd like There are other helper functions like is_missing and is_unresolved would be able to allow custom logic around detecting the source of the value

Describe alternatives you've considered None

omry commented 1 year ago

What is the difference between your new proposed function and is_interpolation ?

kunaltyagi commented 1 year ago

is_interpolation is all kinds of confusing and it works on key level, not Config level. Eg: image

The issue with is_interpolated is that it (and omegaconf) don't seem to remember that the config has been resolved once, and it supposed to be free of interpolation

With OmegaConf.is_unresolved(p) or is_resolved, we can run resolve once and then use p._get_node("d")._value() (or a much easier to use wrapper)

omry commented 1 year ago

I am starting to see what you are asking for. OmegaConf does not remember that OmegaConf.resolve have been called. This process is simply replacing interpolations with the resolved value, but after its called - new interpolations can be assigned again:

cfg = OmegaConf.create({"a": "${b}", "b" : 10})
OmegaConf.resolve(cfg)
# resolved?
cfg.a = "${b}"
# not resolved again?

Feel free to add some more color to what is the use case. At this point I don't think this something worth adding

@Jasha10, @odelalleau, @shagunsodhani: any comments?

kunaltyagi commented 1 year ago

Like I said, if OmegaConf can return the is_resolved status, then I can choose to use a wrapper function to access values instead of using the dot notation or accessor functions.

get_val = lambda conf, var: conf._get_node(var)._value()

if not OmegaConf.is_resolved(config): OmegaConf.resolve(config)

# now we use only get_val and nothing else
print(config.debug)  # Don't allow during code reviews
print(get_val(config, "debug"))  # Safe for leaf nodes

It's not perfect. A slightly better way would be to add an option to the getter so we don't have to rely on a get_val (config.get("debug", interpolate=False) ), but even then we need is_resolved to resolve the config once

In our usecase, we don't control the loading of the config, that is handled by either a test stub or a library. And depending on what the test is testing, we want to resolve or not resolve the config (in our test stub) to keep a consistent behavior

odelalleau commented 1 year ago

I think the main issue here is that if an interpolation returns a string that looks like an interpolation, then the behavior of the original config and the resolved config aren't the same anymore. This is a problem because in general we expect to be able to use cfg after calling OmegaConf.resolve(cfg) just the same way as we would have used cfg in the first place.

Two potential solutions:

  1. (Basically what this feature request is asking, if I understood correctly) Add some kind of "resolved" config-level flag (similar to "struct") that would disable interpolations in the config. Once "resolved", a config does not support interpolations anymore, and we can do cfg.option = "${node_that_does_not_exist}", with cfg.option just returning that string.

  2. When resolving the config, properly escape interpolation-like strings returned by interpolations (e.g., if an interpolation returns the string ${b}, then the node is assigned the string \${b}).

I think option (2) should be simpler to implement, and would provide a more straightforward user experience in most cases as well as better backward compatibility (not having to worry about a new config flag). The only thing that I can see as being potentially confusing is the fact that after resolution, we may still have keys that are considered as interpolations, because escaped interpolations are a special kind of interpolation. So someone using OmegaConf.is_interpolation() on a resolved config may be surprised to see some being True. But in terms of practical usage I feel like this would still be better than (1) (and note that having interpolations in the config after resolution is something that is already happening, as shown in the example above, so this isn't making things worse in this regard).

Jasha10 commented 1 year ago

This is a problem because in general we expect to be able to use cfg after calling OmegaConf.resolve(cfg)

I think it makes sense to try and enforce this invariant.

Some users might complain that they want to break this invariant, e.g. using environment variables to do tricky things (as @kunaltyagi hinted in the screenshot above):

>>> os.environ["FOO"] = "${b}"
>>> cfg = OmegaConf.create("{a: '${oc.env:FOO}', b: 123}")
>>> cfg.a
'${b}'
>>> OmegaConf.resolve(cfg)
>>> cfg.a
123

Brainstorming - here's an idea that may or may not make sense: A third option (in addition to the two proposed by @odelalleau above) is to make interpolation resolution recursive: dereferencing node cfg.a would check if the string returned by an interpolation is again an interpolation. If it is, recursively dereference again.

This option should enforce the invariant on OmegaConf.resolve, as every interpolation would be maximally dereferenced after the first access. Hackers would still be able to do tricky things by setting environment variables. Also, clients wouldn't need to worry about "unescaping" strings returned by __getitem__ (as would be the case if we implement option 2?).

kunaltyagi commented 1 year ago

Can both of these be done?

Maybe we can have an API something like the following

OmegaConf.resolve(cfg, depth:int|None)
# depth=0...INT_MAX: No future resolutions: 0: infinite recursion, 1...INT_MAX for recursion depth
# depth = None, same as calling depth=1 and then unlock_interpolation

OmegaConf.lock_interpolation(cfg)  # all ${x} now treated as strings
OmegaConf.unlock_interpolation(cfg)  # back to default behavior
odelalleau commented 1 year ago

Brainstorming - here's an idea that may or may not make sense: A third option (in addition to the two proposed by @odelalleau above) is to make interpolation resolution recursive: dereferencing node cfg.a would check if the string returned by an interpolation is again an interpolation. If it is, recursively dereference again.

The issue with making interpolation resolution recursive is that it wouldn't be possible to have ${ in a string option, which some users may need.

Also, clients wouldn't need to worry about "unescaping" strings returned by __getitem__ (as would be the case if we implement option 2?).

To be clear, there would be no need to un-escape strings in Option 2. You could have cfg["x"] == cfg.x == "${y}". The un-escaping would be done under the hood on access (like it currently happens e.g. with OmegaConf.create({"x": r"\${y}"}).x).

Jasha10 commented 1 year ago

it wouldn't be possible to have ${ in a string option, which some users may need.

Good point.

To be clear, there would be no need to un-escape strings in Option 2. You could have cfg["x"] == cfg.x == "${y}". The un-escaping would be done under the hood on access (like it currently happens e.g. with OmegaConf.create({"x": r"\${y}"}).x).

This would require the node to own extra metadata for keeping track of whether the string needs to be unescaped when __getitem__/__getattr__ is called, right?

odelalleau commented 1 year ago

To be clear, there would be no need to un-escape strings in Option 2. You could have cfg["x"] == cfg.x == "${y}". The un-escaping would be done under the hood on access (like it currently happens e.g. with OmegaConf.create({"x": r"\${y}"}).x).

This would require the node to own extra metadata for keeping track of whether the string needs to be unescaped when __getitem__/__getattr__ is called, right?

No, this mechanism already exists and is part of the resolution of interpolations.

omry commented 1 year ago

Like I said, if OmegaConf can return the is_resolved status, then I can choose to use a wrapper function to access values instead of using the dot notation or accessor functions.

  1. Why do you need to use private API to access the config fields in the first place?
  2. Why can't you just resolve the config when you load it if you want to always avoid the interpolation resolution later?
kunaltyagi commented 1 year ago
  1. Why can't you just resolve the config when you load it if you want to always avoid the interpolation resolution later?
  1. We don't want issues like this. (code snippet by @Jasha10 )
    >>> os.environ["FOO"] = "${b}"
    >>> cfg = OmegaConf.create("{a: '${oc.env:FOO}', b: 123}")
    >>> cfg.a
    '${b}'
    >>> OmegaConf.resolve(cfg)
    >>> cfg.a
    123
  2. Because we don't load the config ourselves. We are using two middle-ware library, one of which already calls resolve. In order to stay correct, we need to know if the config is already resolved in our mock (else we'd need to monkey patch or detect the middleware and do weird stuff)

Re: point 1, that's a consequence of the fact that the public api always calls resolve while accessing

omry commented 1 year ago
  1. Why can't you just resolve the config when you load it if you want to always avoid the interpolation resolution later?
  1. We don't want issues like this. (code snippet by @Jasha10 )
>>> os.environ["FOO"] = "${b}"
>>> cfg = OmegaConf.create("{a: '${oc.env:FOO}', b: 123}")
>>> cfg.a
'${b}'
>>> OmegaConf.resolve(cfg)
>>> cfg.a
123
  1. Because we don't load the config ourselves. We are using two middle-ware library, one of which already calls resolve. In order to stay correct, we need to know if the config is already resolved in our mock (else we'd need to monkey patch or detect the middleware and do weird stuff)

Re: point 1, that's a consequence of the fact that the public api always calls resolve while accessing

I have never seen anyone using interpolations in environment variables before. The fact that resolve is giving you what you want is accidental. A better solution would be to introduce an alternative to oc.env that would do what you want. It does use some private APIs but I think it's reasonable:

import os
from typing import Any, Optional

from omegaconf import OmegaConf, Node, AnyNode

from omegaconf._utils import _DEFAULT_MARKER_

def myenv(key: str, _node_: Node, default: Any = _DEFAULT_MARKER_) -> Optional[str]:
    try:
        env = os.environ[key]
        # tmp node with the same parent
        tmp = AnyNode(env, parent=_node_._parent)
        return tmp._dereference_node()._value()
    except KeyError:
        if default is not _DEFAULT_MARKER_:
            return str(default) if default is not None else None
        else:
            raise KeyError(f"Environment variable '{key}' not found")

OmegaConf.register_new_resolver("myenv", myenv)
os.environ["FOO"] = "${b}"

cfg = OmegaConf.create({
    "a": '${myenv:FOO}',
    "b": 123
})

print(cfg.a)
omry commented 1 year ago

I feel like my solution is addressing your problem. Feel free to followup here further.