omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.94k stars 105 forks source link

Raise error if there are conflicting keys after merge #1053

Open RaulWCosta opened 1 year ago

RaulWCosta commented 1 year ago

What?

Change the default behavior of merge to raise an error when there is a conflict in the key names after merge. Right now, if there is a key conflict after merge, the last key specified silently overwrites the other. As shown in the test case below.

https://github.com/omry/omegaconf/blob/master/tests/test_create.py#L416

def test_yaml_merge() -> None:
    cfg = OmegaConf.create(
        dedent(
            """\
            a: &A
                x: 1
            b: &B
                y: 2
            c:
                <<: *A
                <<: *B
                x: 3
                z: 1
            """
        )
    )
    assert cfg == {"a": {"x": 1}, "b": {"y": 2}, "c": {"x": 3, "y": 2, "z": 1}}

Why?

I think there are two main points to justify this change:

  1. Duplicated keys are syntactically invalid for YAML files, as shown in the quote below (from https://yaml.org/spec/1.2-old/spec.html#id2764044)

    The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique.

  2. This behavior can cause unexpected problems in running time for OmegaConf's users.


I executed the tests in a Windows 10 machine, using Python 3.10.8 running in a Conda env.

I hope this PR is useful :)

omry commented 1 year ago

Can you reproduce the problem without using yaml anchors?

RaulWCosta commented 1 year ago

Can you reproduce the problem without using yaml anchors?

Without anchors, i.e. by loading a file with direct conflict in the keys (not 100% that was your question), it raises an error as expected. The behavior in this PR also raises an error for that case.

Do you think would be good to add a test case for that scenario?

omry commented 1 year ago

Generally speaking, anchors sucks, and you should use interpolations instead. We are considering disabling anchors support by default to address #794.

RaulWCosta commented 1 year ago

Generally speaking, anchors sucks, and you should use interpolations instead. We are considering disabling anchors support by default to address #794.

Got it! I haven't tested that scenario with interpolation, but I can do it today :)

RaulWCosta commented 1 year ago

Generally speaking, anchors sucks, and you should use interpolations instead. We are considering disabling anchors support by default to address #794.

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]

Output I got:

{'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

Besides that, it looks like this topic is an ongoing discussion in the PR. So maybe, while this is not defined, would be useful to fix this behavior in merge.

Jasha10 commented 1 year ago

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

@RaulWCosta see the docs on OmegaConf interpolations.

from omegaconf import OmegaConf

yaml_data = """
# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
"""

config = OmegaConf.create(yaml_data)

assert config.lol2[0] == "lol"  # lazily access interpolated value ${lol1} -> "lol"

# Try printing `config` before and after calling OmegaConf.resolve:

print(config)
# prints {'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

# Calling resolve would likely fill up your ram if you have lol1,lol2,...lol10 defined
OmegaConf.resolve(config)

print(config)
# prints {'lol1': 'lol', 'lol2': ['lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol']}
omry commented 1 year ago

@Jasha10, I am actually not sure interpolation can be used here: OmegaConf does not provide a parallel to YAML merge (e.g. <<: *A).

RaulWCosta commented 1 year ago

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

@RaulWCosta see the docs on OmegaConf interpolations.

from omegaconf import OmegaConf

yaml_data = """
# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
"""

config = OmegaConf.create(yaml_data)

assert config.lol2[0] == "lol"  # lazily access interpolated value ${lol1} -> "lol"

# Try printing `config` before and after calling OmegaConf.resolve:

print(config)
# prints {'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

# Calling resolve would likely fill up your ram if you have lol1,lol2,...lol10 defined
OmegaConf.resolve(config)

print(config)
# prints {'lol1': 'lol', 'lol2': ['lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol']}

Appreciate!

RaulWCosta commented 1 year ago

Hey @omry really appreciate your time! Do you think it would be best if I close this PR?

omry commented 1 year ago

I would still like to get this reviewed. I am not currently working on OmegaConf but there might be some new maintainers coming soon.