ssato / python-anyconfig

Python library provides common APIs to load and dump configuration files in various formats
MIT License
277 stars 31 forks source link

New lines are doubled with YAML parser #78

Closed adaxi closed 5 years ago

adaxi commented 7 years ago

When using the yaml parser, and the data contains new lines, they are doubled in the yaml file.

To reproduce I create a data structure that contains a string with new lines in it.

Python 2.7.9 (default, Jun 29 2016, 13:08:31) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import anyconfig
>>> data = dict()
>>> data["key"] = """hello
... world
... """
>>> data
{'key': 'hello\nworld\n'}

When I attempt to dump it with the yaml parser you can see that the newlines are doubled.

>>> anyconfig.dumps(data, ac_parser="yaml")
"{key: 'hello\n\n    world\n\n    '}\n"

When I do the same with the JSON parser the expected output is obtained.

>>> anyconfig.dumps(data, ac_parser="json")
'{"key": "hello\\nworld\\n"}'

I can see that the error comes from the PyYAML library because I obtain the same output:

>>> from yaml import dump
>>> dump(data)
"{key: 'hello\n\n    world\n\n    '}\n"
ssato commented 7 years ago

Thanks a lot for your report!

anyconfig just works like an wrapper for 'backend' parsers and if the issue doesn't disappear w/o anyconfig, then that issue should be in the backend.

In this case, ruamel.yaml looks behaving better than yaml and if you need that behavior, I recommend to use ruamel.yaml instead of yaml.

In [1]: import ruamel.yaml, yaml, cStringIO as StringIO

In [2]: cnf = dict(greeting="""Hello,
   ...: world!
   ...: """)

In [3]: sio = StringIO.StringIO(); yaml.dump(cnf, sio); sio.getvalue()
Out[3]: "{greeting: 'Hello,\n\n    world!\n\n    '}\n"

In [4]: sio = StringIO.StringIO(); ruamel.yaml.dump(cnf, sio); sio.getvalue()
Out[4]: '{greeting: "Hello,\\nworld!\\n"}\n'

I've not yet released the version works w/ ruamel.yaml but the current git HEAD may work w/ it. If the HEAD version works as you expected and you need it, I'll release that version ASAP.

AvdN commented 5 years ago

If you read through the YAML specification (http://yaml.org/spec/1.2/spec.html#id2778853) it is indicated that the semantics of an empty line appearing within a scalar, depends on the scalar style. What you encounter here is "flow line folding" in a single quoted scalar (http://yaml.org/spec/1.2/spec.html#id2779950). And that is the only way to represent a newline in a single quoted scalar. So there is no error (which would be easy to see if you reload the output).

IMO it is most of the time more natural to dump strings with embedded newlines as literal block style scalars (where each newline in the scalar represents a newline in the loaded string) but cannot represent all (control) characters, or as double quoted where you can backslash escape newlines (but also use flow line folding).

adaxi commented 5 years ago

Thank you for clarifying this, it is a bit unexpected for someone that is not used to work with yaml. It does not help that both parsers decided to output something different :smile:.