omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.91k stars 104 forks source link

feat: add deep merging for lists #1082

Closed MatteoVoges closed 1 year ago

MatteoVoges commented 1 year ago

Fixes #1080

Proposed changes

Examples

If you have a config:

list:
   - a

and

list:
  - b

you will get with the enabled flag extend_lists this result:

list:
  - a
  - b

The functionality of allow_duplicates is self explanatory, i guess.

Tests and Documentation


Contributed by

MatteoVoges commented 1 year ago

The tests show the following error in the lint job:

tests/test_basic_ops_dict.py:276: error: Keywords must be strings  [misc]
Installing missing stub packages:
/home/circleci/project/.nox/lint-3-10/bin/python -m pip install types-Pygments types-colorama types-setuptools

Found 1 error in 1 file (checked 81 source files)

I didn't edit this file, so I guess that this error is due some wrong configuration or setup in CircleCI. Is there anything I have to look into?

omry commented 1 year ago

Thanks @MatteoVoges.

@Jasha10, @odelalleau, @shagunsodhani: What do you think about this functionality? We can discuss here or on #1080.

At a glance it looks pretty nice, but I didn't review properly.

  1. Add a news fragment (similar to what is described here https://hydra.cc/docs/development/documentation/).
  2. Update the docs showing usage of OmegaConf merge. Also update the notebook with an example.
  3. Try to fix the lint error even if you didn't create it (If it's involved it can be in a separate PR).
odelalleau commented 1 year ago

@Jasha10, @odelalleau, @shagunsodhani: What do you think about this functionality? We can discuss here or on #1080.

I'm good with the proposed new feature. I can't commit to reviewing the code now though.

MatteoVoges commented 1 year ago

@omry I updated the docs and added some examples. The linting error is fixed in #1084 . Is there anything left to do?

Jasha10 commented 1 year ago

The remove_duplicates flag only makes sense if extend_lists is True.

>>> OmegaConf.merge([1,1,2], [2,2,3], remove_duplicates=True)
[2, 2, 3]

Should we issue an error or warning in the case where remove_duplicates is passed and extend_lists is False? Or should we adopt a similar Enum-based approach as is used for the structured_config_mode argument to OmegaConf.to_container?

Something like this:

class OmegaConf:
    ...
    def merge(
        ...,
        list_merge_mode: ListMergeMode = ListMergeMode.OVERWRITE,
    )

...

class ListMergeMode(Enum):
    """How `OmegaConf.merge` handles lists"""
    OVERWRITE = 1  # overwrite left-hand list with right-hand list
    EXTEND = 2  # extend left-hand list with right-hand list
    EXTEND_NO_DUPLICATES = 3  # ...
Jasha10 commented 1 year ago

Also, I think the naming of remove_duplicates=True is a little bit confusing. The duplicates from the right-hand merge argument are removed but the duplicates from the left-hand merge argument are not removed.

>>> list1 = [1, 2, 3, 3]
>>> list2 = [3, 4, 5]
>>> OmegaConf.merge(list1, list2, extend_lists=True, remove_duplicates=True)  # duplicates from list1 not removed
[1, 2, 3, 3, 4, 5]
MatteoVoges commented 1 year ago

I thought about this before as well and my first implementation was indeed using a set. But I decided to go for the currently implemented approach, because the set was scrambling the order of the lists and it was hard to test.

The duplicates from the right-hand merge argument are removed but the duplicates from the left-hand merge argument are not removed.

I actually like the current behavior a lot, because I think if you merge something the original object should not be affected, but only extended. And if you have some duplicate values in your list1, then it is for a reason, and they should appear in the merged object.

Should we issue an error or warning in the case where remove_duplicates is passed and extend_lists is False?

I think a warning would be good here.

Or should we adopt a similar Enum-based approach?

I don't think that this is neccessary, because we don't have many merge options yet and I think two merge modes do not require an enum.

odelalleau commented 1 year ago

FWIW I do agree an Enum would be cleaner, for the reasons @Jasha10 mentioned. And I think that if you rename remove_duplicates into ignore_duplicates it might be less confusing: "remove" suggests post-processing the result (implying duplicates from both sides are deleted), while "ignore" suggests pre-processing (skip over items already in the list).

omry commented 1 year ago

Also, can you fix the lint issues? (If you already did, please rebase on top of your fix).

MatteoVoges commented 1 year ago

Overall looks good. Please change to Enum.

We then have the problem, that you have to import the enum, if you want to use this feature. So maybe there is a little hint in the merge-function-docstring. Or we enable both options, so Enum and String like @Jasha10 proposed?

Where would be a good place for the enum? _utils module?

omry commented 1 year ago

We then have the problem, that you have to import the enum, if you want to use this feature. So maybe there is a little hint in the merge-function-docstring. Or we enable both options, so Enum and String like @Jasha10 proposed?

I think using a string an an alternative is a bit unusual and I am not a fan of this duality.

Where would be a good place for the enum? _utils module?

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

Usage can be something like:

from omegaconf import OmegaConf, ListMergeMode

...
OmegaConf.merge(cfg1, cfg2, list_merge_mode = ListMergeMode.EXTEND_IGNORE_DUPLICATES)

A docstring update to OmegaConf.merge is fine. Don't forget to update docs to reflect this.

Jasha10 commented 1 year ago

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/__init__.py).

odelalleau commented 1 year ago

I can’t check right now but I believe there’s a way to make an enum that can be compared to a string so that just passing the string would work too.

Jasha10 commented 1 year ago

make an enum that can be compared to a string

I think you're referring to enum.StrEnum:

from enum import StrEnum
class ListMergeModeString(StrEnum):
    OVERWRITE = "OVERWRITE"
    EXTEND = "EXTEND"
    EXTEND_NO_DUPLICATES = "EXTEND_NO_DUPLICATES"

assert ListMergeModeString.OVERWRITE == "OVERWRITE"

One issue with StrEnum is that it's only available in python 3.11+.

omry commented 1 year ago

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/__init__.py).

I think this works too. Maybe it's better this way to prevent circular imports.

MatteoVoges commented 1 year ago

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/init.py).

I even followed the SCMode and the ListMergeMode is implemented in the same way.

One issue with StrEnum is that it's only available in python 3.11+.

Unfortunately, this is then unusable for us.

odelalleau commented 1 year ago

One issue with StrEnum is that it's only available in python 3.11+.

Yeah but I’m pretty sure I already implemented something similar in older versions… unfortunately I don’t have access to a computer until next week so I can’t really be more specific.

MatteoVoges commented 1 year ago

The linting error is fixed in #1090 (and here as well)

MatteoVoges commented 1 year ago

Thanks, are there any plans for a (pre-)release in the near future?

omry commented 1 year ago

Maybe @shagunsodhani or @Jasha10 can do that. In the mean time, you can depend on github repo revision from your build.

Jasha10 commented 1 year ago

@MatteoVoges fyi I've just uploaded v2.4.0.dev0 to pypi.org: https://pypi.org/project/omegaconf/2.4.0.dev0/

MatteoVoges commented 1 year ago

Thanks for that! ❤️