omry / omegaconf

Flexible Python configuration system. The last one you will ever need.
BSD 3-Clause "New" or "Revised" License
1.95k stars 106 forks source link

Consider support yaml aliases #93

Closed Yablon closed 4 years ago

Yablon commented 4 years ago

Some yaml files are organizing as the following, could you please consider to support that ? Like the followling from the official documentation https://pyyaml.org/wiki/PyYAMLDocumentation

Aliases Note that PyYAML does not yet support recursive objects.

Using YAML you may represent objects of arbitrary graph-like structures. If you want to refer to the same object from different parts of a document, you need to use anchors and aliases.

Anchors are denoted by the & indicator while aliases are denoted by ``. For instance, the document

left hand: &A name: The Bastard Sword of Eowyn weight: 30 right hand: *A expresses the idea of a hero holding a heavy sword in both hands.

PyYAML now fully supports recursive objects. For instance, the document

&A [ *A ] will produce a list object containing a reference to itself.

omry commented 4 years ago

Hi Yablon, can you show a more concrete example of a yaml file that does not work right now with OmegaConf?

I am not sure what you are trying to represent.

Yablon commented 4 years ago

@omry Yes, thank you for your reply ! I have 2 yaml files.

test_1.yaml

sample_rate: &SR !!int "4000"
sample_rate_test: *SR

test_2.yaml

sample_rate: &SR !!int "8000"

but when I merge the 2 files using omegaconf,

a = OmegaConf.load('test_1.yaml')
b = OmegaConf.load('test_2.yaml')
c = OmegaConf.merge(a, b)
print(c)

I got this

{'sample_rate': 8000, 'sample_rate_test': 4000}

What I expected is I got all the variables with the same aliases changed.

I think if you support the merge method, it is reasonable to support the above usage.

Thank you!

omry commented 4 years ago

The problem is that PyYAML resolves the aliases when you load each individual file. luckily, OmegaConf does support this functionality through interpolations:

file1:

a: 10
b:
   a: 20

file1:

a1 : ${a}
b1 : ${b.a}
c1 = OmegaConf.load('file1.yaml')
c2 = OmegaConf.load('file2.yaml')
c = OmegaConf.merge(a, b)
# will print as is
print(c.pretty())

# will print with values resolved.
print(c.pretty(resolve=True))
Yablon commented 4 years ago

@omry That is great ! However, I think maybe I didn't express myself well.

I usually use a yaml config file as an instead of tf.contrib.training.HParams for now.

The yaml config file is usually a little bit complicated, and contains many items.

For simplicity, I usually write two yaml files when training. One is the big yaml config file, the other is a small file that may contain many nodes. When training, I will read the big yaml config file first and the replace the variables in the big file with the variables in the small file.

In that way, my training history is saved with not so many efforts.

But the problem is that, some aliases in the big file, can't changed once for all with the variable in small files. For a long time I have considered how to resolve this, until I see your awesome omegaconf. That's why I give the examples above.

In the end, I want to replace variables that has aliases all with another value. Is that possilbe with omegaconf ?

By the way, I found lists variables read using omegaconf don't support list operations. like the following,

a = OmegaConf.load('config.yaml')
b = a.some_list + [1]  # that's ok
b = [1] + a.some_list # throw an error

will throw an error, can that be fixed ?

Thank you!

omry commented 4 years ago

Hi Yablon, Thanks for sharing more context. I am happy to hear you are using OmegaConf for machine learning. it was actually created with a machine learning use case in mind.

Firstly, since you already know about OmegaConf, spend some time going through the documentation of what OmegaConf can do. I then strongly suggest that you look at Hydra which builds on OmegaConf to make it even more powerful and good to use for complex use cases like ML.

Hydra make it easy to compose configurations with OmegaConf in a way that is most likely powerful enough to do what you need to do. It also offers other useful features (parameter sweeps, tab completion and more).

Your explanation of the specific problem you are facing is not good enough for me to understand yet. it's best to show with a small example what the problem is.

About your other problem with list: it is a known issue, OmegaConf list is not really a list and primitive list does not like adding itself with it. You can work around it by doing:

a = OmegaConf.load('config.yaml')
b = OmegaConf.create([1]) + a.some_list

Feel free to ask followup questions. you can also join the chat (see chat link in README for this project), I can answer more questions about both there.

Yablon commented 4 years ago

Thank you @omry

Sorry for my low level of English language, what makes you confused. I write an example like example

Hydra seems to be a big repository, and I will go through it later.

omry commented 4 years ago

A few notes about your repo:

  1. Stop using yaml anchors with OmegaConf, they do not work well when you combine different configs.
  2. You don't need to declare the type like max_wav_value: !!float "32768.0", This would be just fine: max_wav_value: 32768.0

Try something like this (with OmegaConf interpolation):

all_config.yaml:

sampling_rate: 8000.0

feature_extract:
  sample_rate: ${sampling_rate}

train.yaml:

sampling_rate: 16000.0  
base = OmegaConf.load("all_config.yaml")
traincfg = OmegaConf.load("train.yaml")
config = OmegaConf.merge(base, traincfg) # first base, then traincfg
print(config.pretty(resolve=True))

This should print something like (I didn't test):

sampling_rate: 16000.0
feature_extract:
  sample_rate: 16000.0

And really, please spend 20 minutes to go through the Hydra tutorial. It will really make your life easier to switch to it.

Yablon commented 4 years ago

@omry Thank you for your example, it is very helpful. I will learn Hydra.