omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.91k stars 104 forks source link

Unexpected behavior for multiline-strings #1112

Open MatteoVoges opened 1 year ago

MatteoVoges commented 1 year ago

Describe the bug Hey, I'm trying to have a config, where some values are in an escaped interpolation, but in a multiline-string. I think this is a really specific use case, but I think that this applies to all escaped characters.

To Reproduce

from omegaconf import OmegaConf as oc, Node

correct = {
    "config": {
        "subconfig": "MULTILINE \n    ${tag:ENV}",
    },
    "values": "${config}",
}

wrong = {
    "values": "${config}",
    "config": {
        "subconfig": "MULTILINE \n    ${tag:ENV}",
    },
}

def escape_tag(input: str, _node_: Node):
    """resolver function, that returns an escaped tag with the input"""
    if "\n" in str(_node_):  # multiline-strings
        escape = "\\\\\\"  # \\\
    else:
        escape = "\\"  # \

    return escape + "${" + input + "}"

oc.register_new_resolver("tag", escape_tag)

cfg_correct = oc.create(correct)
cfg_wrong = oc.create(wrong)

oc.resolve(cfg_correct)
oc.resolve(cfg_wrong)

print(cfg_correct["values"])
# >>> {'values': {'subconfig': 'MULTILINE \n    \\${ENV}'}}
print(cfg_wrong["values"])
# >>> {'values': {'subconfig': 'MULTILINE \n    \\\\\\${ENV}'}

### Another bug ?
print(cfg_correct["values"]["subconfig"])
# >>> [...] ${ENV}
print(cfg_wrong["values"]["subconfig"])
# >>> [...] \${ENV}

Explanation Here I have two configs correct and wrong and the only difference between them is the key order in the python file. Then I have a custom resolver that produces me an escaped interpolation. In multiline-strings \ has to be escaped and because of the double escapes (https://github.com/omry/omegaconf/pull/695#discussion_r621335546) we need 3 \. Now the diff is that in the wrong config, each \ is escaped again, and it produces 6 \.

Another bug (?) is, that when I print the actual strings, that the string gets dumped as multiline and not as quoted, but this could be intended.

Expected behavior I expect, that the order in the config doesn't matter and both configs produce \${ENV}.

If there is any kind of workaround like sorting the keys in the config, I would be happy already.

Additional context

odelalleau commented 1 year ago

Interesting -- I believe this is the same root issue as in https://github.com/omry/omegaconf/issues/1081#issuecomment-1570443922

I'll have a quick look to check if option (2) mentioned in that post would be easy to implement.

MatteoVoges commented 1 year ago

There are definitely some similarities. Is there a general problem with producing (escaped) interpolations, because I use them very often. But this time I couldn't believe that the key order matters...

odelalleau commented 1 year ago

Is there a general problem with producing (escaped) interpolations

Yes (IMO).

I have a tentative fix in #1113. I can't guarantee if/when a new version will be released with this fix, so you might need to use a custom omegaconf version for now.

Note that this may break some of your existing code if you are relying on the current buggy behavior (for instance I think you'll need to change your tag resolver to achieve what you want with this PR).

MatteoVoges commented 1 year ago

Thanks for the quick fix. I will try it in a minute. This is just a fix for the escaping, right? Because I still wonder what causes the different behavior regarding the order of the keys in my example. Will it fix that too?

odelalleau commented 1 year ago

I still wonder what causes the different behavior regarding the order of the keys in my example. Will it fix that too?

Yes I expect so. (just be careful that my first PR was bugged, I just pushed a fixed version)

MatteoVoges commented 1 year ago

~~I don't know if I'm doing something wrong, but if I run your code, my example above is still incorrect. Even if I need to change the tag resolver, then the output for both configs should be equal. I will try again later today, if I have more time~~

Works now, I edited the wrong file in my env

MatteoVoges commented 1 year ago

Hey, I was able to work with the fix a bit and I noticed some breaking changes and some use cases that don't work anymore. Maybe we could achieve, that this is not the case and we're just fixing the bug in another way.

The bug / breaking change happens when creating an interpolation with a resolver:

from omegaconf import OmegaConf as oc

def produce_interpolation(content: str):
    return "${" + content + "}"

test = {
    "value": "abc",
    "interpolation": "${produce_interpolation:value}",
}

oc.register_new_resolver("produce_interpolation", produce_interpolation)

config = oc.create(test)
print(config)
# >>> {'value': 'abc', 'interpolation': '${produce_interpolation:value}'}

oc.resolve(config)
print(config)
# >>> {'value': 'abc', 'interpolation': '\\${value}'}

oc.resolve(config)
print(config)
# >>> {'value': 'abc', 'interpolation': '\\${value}'}

Before the fix, the resolver just produces ${value} (unescaped) and in the second resolve step it resolves to abc. This would also be my favorite behavior for this.

odelalleau commented 1 year ago

Hey, I was able to work with the fix a bit and I noticed some breaking changes and some use cases that don't work anymore.

Yes, this is what I meant in #1113 when I wrote "Note that this is a breaking change".

The current behavior isn't actually intended (there are no tests covering this use case), and can cause issues as we've seen (it's also potentially confusing that a config doesn't behave the same way after it's been resolved).

Can you explain your use case? There may be another way to achieve what you want, for instance maybe you can directly resolve the interpolation you are creating (this is an example that requires direct access to the internal API so it's not great, but it might be possible to expose some of it if needed):

from omegaconf.omegaconf import _node_wrap
from omegaconf._impl import _resolve

def produce_interpolation(content: str, _parent_):
    interpolation = "${" + content + "}"
    return resolve_interpolation(interpolation, parent=_parent_)

def resolve_interpolation(interpolation, parent):
    node = _node_wrap(
        value=interpolation,
        is_optional=False,
        key="__tmp__",
        parent=parent,
    )
    _resolve(node)
    return node._value()

We may also consider adding a flag to resolve() to indicate whether we want the new or old behavior (but fixing the issue where the resolution order matters). This could actually be useful for to_yaml() and to_container(), which also require looking at.

MatteoVoges commented 1 year ago

I will test your resolver in a minute, thanks for this!

My usecase is large templating of several configs with default values. This is a simplified example of it: We have a default "class" with a default structure:

default:
  valueA: aaa
  valueB: bbb
  api:
    deep:
      nested:
        valueA: ${relative_path:default.valueA}
        valueB: ${relative_path:default.valueB}

Then we have special objects that "inherit" from this base class

special: ${merge:${default},${special-config}}

special-config:
  valueA: zzz

Here we have two resolvers relative_path, that converts an absolute interpolation path to a relative one and important: delays one resolving stage, and merge, that takes several configs and merges them using OmegaConf.merge()

If we resolve this in a two step-system the interpolations get resolved after merging and we get the final config:

default: [object is resolved, but unimportant]
special-config: [unimportant]

# important
special:
  valueA: zzz
  valueB: bbb
  api:
    deep:
      nested:
        valueA: zzz # overridden value
        valueB: bbb # default value

So TLDR: We are able to produce reusable objects templated with default values and can have many copies of them where some special values are set. This hierarchy of inheritence can of course go further like a specialA and specialB object, that get merged with the special object.

We at kapitan need this, because many of the values are redundantly defined and we need a simple way to template these efficiently. Then we can search for the key api and can copy these config values to helm for example.

I know that this may seem way too complicated and there are certainly 100 methods to do it a lot easier...

MatteoVoges commented 1 year ago

Wouldn't be a solution, that the return value of a resolver is always directly the value and there is no escaping at all?

Some solution that goes in the same direction is the modification of the _unescape function in the grammar_visitor. This would allow us, to have the behavior with resolvers and interpolations as it was before and we treat escaped interpolations differently. My idea would be to keep escaped interpolations escaped, until we want to have the real value of the config, so accessing, to_object() and to_yaml(). WDYT?

UPDATE: I tried with modifying and came up with removing the +1 in grammar_visitor.py line 359:

- text = s.text[-(len(s.text) // 2 + 1) :]
+ text = s.text[-(len(s.text) // 2 ) :]

and this seems working. I will try to run the tests with it soon...

MatteoVoges commented 1 year ago

Your resolver works fine for some cases like the simplified repro from https://github.com/omry/omegaconf/issues/1112#issuecomment-1678573449 but for my most recent use case it isn't suitable because I need the actual resolving happen after the merge.

We may also consider adding a flag to resolve() to indicate whether we want the new or old behavior (but fixing the issue where the resolution order matters). This could actually be useful for to_yaml() and to_container(), which also require looking at.

I think this would be the best solution, because I might not be the only one, that uses this in their resolvers.

odelalleau commented 1 year ago

I'll need to spend a bit more time understanding your use case and be able to tell whether there's a better way to achieve what you want, but in the meantime I updated #1113 so that resolve() now takes a flag to control whether or not to escape interpolation strings. I reverted the default behavior to the old one for now, and hopefully the problem you originally mentioned in this issue should be fixed.

odelalleau commented 1 year ago

Wouldn't be a solution, that the return value of a resolver is always directly the value and there is no escaping at all?

That was the previous behavior. The main issue with it is that cfg does not behave the same before / after calling resolve(cfg), which I'd argue is bad in most use cases.

Some solution that goes in the same direction is the modification of the _unescape function in the grammar_visitor. This would allow us, to have the behavior with resolvers and interpolations as it was before and we treat escaped interpolations differently. My idea would be to keep escaped interpolations escaped, until we want to have the real value of the config, so accessing, to_object() and to_yaml(). WDYT?

UPDATE: I tried with modifying and came up with removing the +1 in grammar_visitor.py line 359:

- text = s.text[-(len(s.text) // 2 + 1) :]
+ text = s.text[-(len(s.text) // 2 ) :]

and this seems working. I will try to run the tests with it soon...

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

MatteoVoges commented 1 year ago

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

You tried already, or an assumption?

odelalleau commented 1 year ago

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

You tried already, or an assumption?

I tried it.

MatteoVoges commented 1 year ago

but in the meantime I updated https://github.com/omry/omegaconf/pull/1113 so that resolve() now takes a flag to control whether or not to escape interpolation strings.

Sounds great. Thank you really much!

I tried it.

Alright. Then I just got lucky with my testscript...