omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.98k stars 113 forks source link

Consider supporting basic operators in interpolations of numbers #91

Open omry opened 4 years ago

omry commented 4 years ago
a: 1
b: 2.0
d: ${a} - ${b}
e: ${a} + ${b}
d: ${a} * ${b}
e: ${a} / ${b}

desired behavior:

conf = OmegaConf.load("file.yaml")
assert conf.d == -1.0
assert conf.d == 3.0
assert conf.d == 2.0
assert conf.d == 0.5

This should work only if both operands are numbers (int or float). result type should be int if both are ints or float otherwise.

EDIT (2022-12-20):

✨✨✨ See the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver ✨✨✨

YannDubs commented 4 years ago

+1. Out of curiosity, why not use the default behavior of pythons operands for any types? I.e. don't check the type (let python return its errors if need be) and just call + , - , / , * . For example do string concatenation by default with +.

omry commented 4 years ago

String concatenation is already supported directly:

url: http://${host}:${port}:${path}

It might still be worth supporting it, but some other things does not make immediate sense, for example:

a:
  bb: 10
  cb: 20

b:
  xx: 30
  yy: 40

c: foo

d1: ${a} + ${b}
d2: ${a} + ${c}

While it's plausible to come with interpretations for both d1 and maybe even d2, I suspect the surface area from supporting those interpretation is very large. I would like to keep operator support well defined and limited to the most impactful use cases.

YannDubs commented 4 years ago

Ok, I would personally just use python way of doing by returning

def interpolate_binary(a,b,op):
    return op(OmegaConf.to_container(a, resolve=True),
              OmegaConf.to_container(b, resolve=True))

In which case both d1 and d2 would raise a TypeError. That seems like the most intuitive and easiest way of maintaining things. But maybe it's naive, I didn't actually look through OmegaConf codebase much ...

omry commented 4 years ago

Supporting + for arbitrary values may be incompatible with existing strings:

foo: FOO
bar: BAR
key: ${foo}+${bar}

current interpretation is FOO+BAR, new interpretation will be FOOBAR. limiting to numerical types reduces the incompatibility impact.

There should be a way to prevent operators from executing for cases like this.

In any case, when I try to actually implement it I will think about it more seriously.

omry commented 3 years ago

@odelalleau, I was thinking about introducing an eval resolver later to do arithmetic operations. It could also be using a different syntax, like #{expression}.

two : #{1 * 2}   
five : #{1 + ${two} * 2} 

# or using a regular resolver:
eight : ${eval:4 + ${two} * 2} 

I kinda like #{} though (although I just realized it's a yaml comment so it will have unfortunate side effects. some other symbol? % or ^ ?).

odelalleau commented 3 years ago

I was thinking about introducing an eval resolver later to do arithmetic operations.

Personally I would find it more intuitive to do it the way you initially wrote at the top of this issue, rather than introduce a new resolver or syntax. Did you have any major concern with your initial proposition? (I think it should remain limited to floats & integers, which should cover most common use cases, and people can still write custom resolvers for more advanced computations).

omry commented 3 years ago

The main concern is backward compatibility. Interpreting all configs as expressions means that you can't have a config value that is an expression:

x: 10 * 2

Quoting it will not help because at YAML level it's already a string. An interpolation/syntax allow you to differentiate the two cases.

# "10 * 2"
x: 10 * 2

# 20
y: ${eval:10 * 2}

It is possible to only process expressions if they contain interpolations, but feels like a hack. This is useful expression that will not be handled:

# 10MB
y: 10 * 1024 * 1024

I guess we could go with the new decode resolver to support the string form:

y: ${oc.decode:"10 * 1024 * 1024"}  # should be 10 * 1024 * 1024
odelalleau commented 3 years ago

Follow-up thoughts on this (tl;dr: I think it can be implemented at the level of resolver arguments, but this would make it a bit more cumbersome to use arithmetic operators in strings especially on Hydra's CLI, so maybe a unique syntax like #{..} is better)

omry commented 3 years ago

Makes sense. I don't think we need to deprecate anything at this time (since we are not sure about the final design yet). We can deprecate things in between 2.1 and 2.1 without a major version (e.g. 2.1.3). Also - this is also absolutely not getting into 2.1, so no worries there.

addisonklinke commented 2 years ago

A few thoughts here

  1. What about supporting inequality operators in addition to arithmetic?
  2. I like the ${eval:} or #{expression} syntax mentioned a lot. That seems to be very flexible, and by using the former I think we get nested interpolations for free, right?
  3. Could ${eval:} make use of something like ast.literal_eval() to support more complex expressions? Likely this should be modified to not allow any arbitrary expression since that'd be a security risk. See below for desired example
from dataclasses import dataclass
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf

@dataclass
class Conf:
    foo: str = 'bar'
    baz: int = "${eval: 1 if ${foo} == 'bar' else 0}"  # Magic, yet-to-be-implemented resolver

cs = ConfigStore.instance()
cs.store('config', Conf)

@hydra.main(config_name='config', config_path=None)
def main(cfg):
    resolved_cfg = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
    print(OmegaConf.to_yaml(resolved_cfg))

if __name__ == '__main__':

    main()

It'd be cool to have this output

python app.py
foo: bar
baz: 1

python app.py foo=other
foo: other
baz: 0

Currently to get the equivalent behavior, you need to introduce some auxiliary resolvers

def resolve_if(condition, true_value, false_value):
    return true_value if condition else false_value

def resolve_eq(first, second):
    return first == second

@dataclass
class Conf:
    foo: str = 'bar'
    baz: int = "${if: ${eq: ${foo}, 'bar'}, 1, 0}"

OmegaConf.register_new_resolver('if', resolve_if)
OmegaConf.register_new_resolver('eq', resolve_eq)

Personally I find the eval interpolation preferable since the desired logic is immediately clear to anyone who understands Python. In comparison, the workaround would require the user to check the source code registered to each custom resolver in order to clarify what the expected result would be. Lastly, the eval option doesn't require nested curly-braces

${if: ${eq: ${foo}, 'bar'}, 1, 0}
${eval: 1 if ${foo} == 'bar' else 0}
Jasha10 commented 2 years ago

and by using the former I think we get nested interpolations for free, right?

Hopefully xD

Jasha10 commented 2 years ago

I'd like to remark that using python's builtins.eval already gets much of the job done:

from typing import Any
from omegaconf import OmegaConf

OmegaConf.register_new_resolver("eval", eval)

cfg = OmegaConf.create('a: "Computed ${eval: 2 * 3 * 5}"')
print(cfg.a)  # Prints 'Computed 30'

Using nested interpolations is a little bit tricky, but careful quoting makes it possible:

cfg = OmegaConf.create(
"""
foo: bar
baz: ${eval:'1 if "${foo}" == "bar" else 0'}
"""
)
print(cfg.baz)  # Prints '1'

The interpolation "${foo}" needs to be quoted so that eval will interpret its value as a string "bar" and not as a variable bar (which would cause a NameError).

camall3n commented 2 years ago

@Jasha10 That's really cool! Ideally I would want there to be a syntax that works in all cases, but the spaces and quotes really make things quite tricky.

After importing/registering, I ran the following tests:

# Test A: Quote the entire RHS, with a space after `eval:`
def test_1a_eval_outer_with_space():
    cfg = OmegaConf.create("""
    a: '${eval: 2 * 3 * 5}'
    """)
    assert cfg.a == 30 # => pass!

def test_2a_nested_outer_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: '${eval: 1 if "${foo}" == "bar" else 0}'
    """) #        => GrammarParseError: token recognition error at: '='
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test B: Quote the insides only, with a space after `eval:`
def test_1b_eval_inner_with_space():
    cfg = OmegaConf.create("""
    a: ${eval: '2 * 3 * 5'}
    """) #   ^    => ScannerError: mapping values are not allowed here
    assert cfg.a == 30

def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: ${eval: '1 if "${foo}" == "bar" else 0'}
    """) #     ^  => ScannerError: mapping values are not allowed here
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test C: Quote entire RHS, with no space after `eval:`
def test_1c_eval_outer_no_space():
    cfg = OmegaConf.create("""
    a: '${eval:2 * 3 * 5}'
    """)
    assert cfg.a == 30 # => pass!

def test_2c_nested_outer_no_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: '${eval:1 if "${foo}" == "bar" else 0}'
    """) #        => GrammarParseError: mismatched input '"' expecting BRACE_CLOSE
    assert cfg.baz == 1

# --------------------------------------------------------------------
# Test D: Quote the insides only, with no space after `eval:`
def test_1d_eval_inner_no_space():
    cfg = OmegaConf.create("""
    a: ${eval:'2 * 3 * 5'}
    """)
    assert cfg.a == 30 # => pass!

def test_2d_nested_inner_no_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: ${eval:'1 if "${foo}" == "bar" else 0'}
    """)
    assert cfg.baz == 1 # => pass!
# --------------------------------------------------------------------

The best option seems to be method D, immediately following eval: with a quoted string, without putting a space in between.

But I don't have a clue as to why...

odelalleau commented 2 years ago

The best option seems to be method D, immediately following eval: with a quoted string, without putting a space in between.

But I don't have a clue as to why...

If you don't quote the string then you are restricted to only characters allowed in unquoted strings, so things will break as soon as you use another character in your expression. For instance, quotes are not allowed, which explains some of your errors.

Putting a space after eval actually works at the OmegaConf level, but it's failing for you because you're using YAML syntax and YAML doesn't like it.

Here are two ways to fix it (besides removing the space):

# Dropping YAML
def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create(
        {
            "foo": "bar",
            "baz": """${eval: '1 if "${foo}" == "bar" else 0'}"""
        }
    )
    assert cfg.baz == 1
# Quoting the YAML string (which requires escaping the inner quotes for the YAML parser)
# (note that in a .yaml file you would use a single \, here it is doubled because it is in a Python string)
def test_2b_nested_inner_with_space():
    cfg = OmegaConf.create("""
    foo: bar
    baz: "${eval: '1 if \\"${foo}\\" == \\"bar\\" else 0'}"
    """)
    assert cfg.baz == 1
camall3n commented 2 years ago

Thanks, that's helpful. That also explains why the ScannerErrors are YAML errors, but the GrammarParseErrors are OmegaConf errors.

ysig commented 1 year ago

But I have not understood how in this syntax how you can write things like:

a:
  aa: 64

b:
  bb: ${eval: '2 * ${a.aa}'}

The above should work but it doesn't in my case :(

odelalleau commented 1 year ago
a:
  aa: 64

b:
  bb: ${eval: '2 * ${a.aa}'}

The above should work but it doesn't in my case :(

It's probably just a YAML parsing error, you can fix it by removing the space after eval: or enclosing bb's definition within double quotes.

ysig commented 1 year ago

I did this and now I get this: omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type eval

Jasha10 commented 1 year ago

Hi @ysig, see the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver.

ysig commented 1 year ago

I have seen them but I don't see where my error is. After all I just use multiplication.

On Tue, Dec 20, 2022, 14:15 Jasha Sommer-Simpson @.***> wrote:

Hi @ysig https://github.com/ysig, see the OmegaConf docs on How to Perform Arithmetic Using eval as a Resolver https://omegaconf.readthedocs.io/en/2.3_branch/how_to_guides.html#id1

— Reply to this email directly, view it on GitHub https://github.com/omry/omegaconf/issues/91#issuecomment-1359274045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGY7H2OEOBIXVHNUU2UP25LWOGPPJANCNFSM4JTY26UA . You are receiving this because you were mentioned.Message ID: @.***>

odelalleau commented 1 year ago

I have seen them but I don't see where my error is. After all I just use multiplication.

You are most likely forgetting to register the eval resolver:

OmegaConf.register_new_resolver("eval", eval)
teocasse commented 1 year ago

Am I the only one concerned about the fact that registering eval as a resolver opens a can of worms from a security perspective?

It opens the door to arbitrary code execution via configuration, so I find it hard to believe that it is the suggested official workaround! Moreover, such a suggestion is IMHO also inconsistent with other Hydra design choices. Take for instance the hydra.utils.instantiate API: its purpose is evidently to allow to instantiate classes and call functions from configuration without allowing arbitrary code execution. If an eval resolver was indeed considered a "valid choice" by Hydra documentation, then why should the hydra.utils.instantiate API even exist, given that its use case is already covered by this resolver? The same applies to other more specific resolvers as well: why would they be needed if an eval resolver exists? These questions are just to illustrate that suggesting an eval resolver in the official docs is inconsistent with the rest of the API (which takes safe design choices).

Please don't get me wrong: I am a happy user of Hydra and I am thankful to all the contributors that work on this framework, I use the hydra.utils.instantiate API and several specific resolvers, and I will welcome the feature described in this ticket, but I will definitely NOT include an eval resolver in my application: it's a security risk and there are several better workarounds. Please consider removing the eval suggestion from the docs.

odelalleau commented 1 year ago

Please consider removing the eval suggestion from the docs.

I agree it'd be good to add a warning, but IMO it's a useful tip that's worth keeping in the docs (there are many situations where people don't care about such a security risk -- I know I'm a happy user of the eval resolver and I'm well aware that it allows arbitrary code execution through the config).

Regarding your comment on eval making instantiate useless, note that (i) eval is not available by default (and shouldn't be, for the exact reason you mention here), and (ii) it's not as convenient to use as the instantiate API to create complex objects.

teocasse commented 1 year ago

Opinions can differ of course, but IMO just adding a warning because it can be a useful tip is not enough given that we are talking about the official docs. For one person like you that is using it who is perfectly conscious of the risks, I bet there is another (and probably more) who will use it without thinking a moment about the risks because "it's in the docs". If I didn't know better, I would happily include it because it's so-damn-practical!

I don't consider eval an absolute bad practice: there are some scenarios where its use can be acceptable. But this is exactly THE scenario where using eval is a no-go unless you really, really know what you are doing and you are in complete control.

We are talking about a one-liner that makes the entire application unsecure, just its presence means that any parameter that is configurable by the end-user can be abused for code execution. Just imagine a multi-dev scenario where a developer added this resolver, and another one unaware of it exposes an innocuous parameter on the front-end...

Since Hydra is a configuration framework, I really think this is where the docs should draw a line in the sand and absolutely discourage this. IMO, putting it in the docs means there is some blessing from the framework on using this pattern, even with a warning... Unless you make the warning really huge, but then why did you even include it in the docs?

IMO such a suggestion does not belong to official docs, rather to a stackoverflow thread so that it can get up/downvoted and commented, with other safer alternatives discussed.

The choice is down to Hydra developers of course, but personally, I think that a better solution would be either

Regarding your comment on eval making instantiate useless

My point was rather that this suggestion in the docs is inconsistent with the presence of other safer alternatives, not that instantiate becomes absolutely useless. Anyway, the main point is the security aspect.

odelalleau commented 1 year ago

@teocasse what if there existed a oc.eval_num resolver that would call eval() under the hood, but would refuse to do so if there is any aphabetic character within the string? ([a-zA-Z_]) Off the top of my head I don't think you can abuse eval() without such a character (it would unfortunately prevent numbers like 10_000 unless we make the parsing smarter, but that's a small price to pay for safety).

teocasse commented 1 year ago

I cannot comment whether the specific implementation you suggest would be secure, but on a more general level I would not be against a "safer" eval implementation. With eval the common recommendation is to make sure that your input is sanitized before passing it, and your suggestion goes exactly in that direction. What was concerning me wasn't the usage of eval per-se, rather its "naïve" usage in the suggested resolver.

This post suggests a few alternatives to "plain eval" in order to parse mathematical expressions in a secure way: https://stackoverflow.com/questions/2371436/evaluating-a-mathematical-expression-in-a-string

JulesGM commented 7 months ago

eval is atrociously unsafe. This would be recreating the very famous historically gigantic yaml security vulnerabilities of arbitrary code execution from the configs.

JulesGM commented 7 months ago

People don't expect to have to monitor configs for arbitrary code execution injection, and don't do it.