payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
21 stars 27 forks source link

Inconsistency between yaml libraries #441

Closed aidanheerdegen closed 3 months ago

aidanheerdegen commented 6 months ago

It seems ruamel yaml is pickier than PyYAML, because I go this error

$ payu checkout -r archive/restart100 -b 1xCO2 e12bda105f3
Created and checked out new branch: 1xCO2                                                                                                    
laboratory path:  /scratch/lg87/aph502/access-esm
binary path:  /scratch/lg87/aph502/access-esm/bin
input path:  /scratch/lg87/aph502/access-esm/input
work path:  /scratch/lg87/aph502/access-esm/work
archive path:  /scratch/lg87/aph502/access-esm/archive
Updated metadata. Experiment UUID: cc4197a5-9df7-410c-9aa9-a60c5593c5d2
Traceback (most recent call last):
  File "/g/data/vk83/apps/payu/1.1.3/bin/payu", line 10, in <module>
    sys.exit(parse())
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/payu/cli.py", line 41, in parse
    run_cmd(**args)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/payu/subcommands/checkout_cmd.py", line 34, in runcmd
    checkout_branch(is_new_branch=new_branch,
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/payu/branch.py", line 175, in checkout_branch
    add_restart_to_config(prior_restart_path, config_path=config_path)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/payu/branch.py", line 71, in add_restart_to_config
    config = yaml.load(config_path)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/main.py", line 446, in load
    return self.load(fp)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/main.py", line 451, in load
    return constructor.get_single_data()
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 116, in get_single_data
    return self.construct_document(node)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 125, in construct_document
    for _dummy in generator:
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 1476, in construct_yaml_map
    self.construct_mapping(node, data, deep=True)
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 1366, in construct_mapping
    if self.check_mapping_key(node, key_node, maptyp, key, value):
  File "/g/data/vk83/apps/payu/1.1.3/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 278, in check_mapping_key
    raise DuplicateKeyError(*args)
ruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping
  in "config.yaml", line 1, column 1
found duplicate key "qsub_flags" with value "-W umask=027" (original value: "-j oe")
  in "config.yaml", line 62, column 1

To suppress this check see:
    http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys

when trying to checkout a new branch from an existing experiment that has duplicate qsub_flags options in the config.yaml:

https://github.com/aidanheerdegen/access_esm_mio_run/blob/mio_v4_master/config.yaml

The experiment ran without error, so my supposition is that PyYAML silently ignores the duplicate keys. We should suppress the exception, but warn if possible.

aidanheerdegen commented 5 months ago

This is how to allow duplicate keys

https://yaml.readthedocs.io/en/latest/api/#duplicate-keys

yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
yaml.load(stream)

Not sure if we need to make sure the two libraries are consistent with what key values they keep?

There is an existing issue here where others are annoyed pyyaml isn't following the standard

https://github.com/yaml/pyyaml/issues/165

some work-arounds to make pyyaml disallow duplicate keys

https://gist.github.com/pypt/94d747fe5180851196eb

aidanheerdegen commented 5 months ago

Not sure if we need to make sure the two libraries are consistent with what key values they keep?

To be specific ruamel ignores subsequent duplicate values, whereas pyyaml overwrites them.

So given dupe.yaml:

a: 1
b: 2
a: 3

Running this:

from ruamel.yaml import YAML
import yaml as pyyaml

yaml = YAML()
yaml.allow_duplicate_keys = True

with open('dupe.yaml') as stream:
   data = yaml.load(stream)
print(data)

with open('dupe.yaml') as stream:
   data = pyyaml.safe_load(stream)
print(data)

produces this:

$ python test.py 
{'a': 1, 'b': 2}
{'a': 3, 'b': 2}
aidanheerdegen commented 5 months ago

This is blocking answering this query on the forum because there is a duplicate key in a config, and ruamel can't deal with it and throws an exception.