mk-fg / pretty-yaml

PyYAML-based module to produce a bit more pretty and readable YAML-serialized data
Do What The F*ck You Want To Public License
135 stars 29 forks source link

pyaml converts integer-like strings to integers #20

Closed mpenkov closed 7 years ago

mpenkov commented 7 years ago

For example:

(ae.venv)bash-3.2$ python
Python 2.7.12 (default, Nov  7 2016, 14:33:03) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyaml
>>> pyaml.dump({"foo": "1"})
u'foo: 1\n'

Is this intended behavior? The problem is that the data we read back from this sort of YAML is different to what was written.

mk-fg commented 7 years ago

Not really intended, more like side-effect of what the thing does, i.e. to not have quotes around every single thing and such. Note that integers in YAML can be represented in all sorts of ways, i.e. 1_000 or 5e10 or 0xff should all be escaped to remain strings too.

So I'd say if you're looking for something that reliably serializes-deserializes stuff, use PyYAML directly, maybe with bunch of options from the top of the readme - these shouldn't break anything.

I think this is a good example of "bad things that can happen" to put into readme warning, thanks for bringing up, don't think it should be fixed though.

mk-fg commented 7 years ago

Hopefully this should be make such crazy behavior more obvious or at least harder to miss: https://github.com/mk-fg/pretty-yaml/#warning

mpenkov commented 7 years ago

A warning is better than nothing, but wouldn't it be easier (from the point of view of the user) for us to predict this craziness and code around it? I understand there'll always be edge cases that require convoluted solutions that are worse than the problem, but this particular case seems relatively simple: if it's a string, and it parses into a float, then keep it as a string (quote it).

mk-fg commented 7 years ago

You're right from the "output must be correct" perspective, but the way I tend to use this module, I'd prefer it to actually be incorrect but don't have quotes around every digit-only string in there, and maybe use str() after parser if such output should be parsed and it's a common-enough case.

Might be easy to add an option to do what you hinted at though, i.e. some init keyword plus a check in represent_stringish.

mpenkov commented 7 years ago

OK, I'll have a look at it.

mnieber commented 3 years ago

I understand the idea behind "you probably don't want quotes around integers, so I won't put them" but in that case, can this be made configurable with a flag?

mk-fg commented 3 years ago

Certainly can be, as mentioned, I'm just lazy to do it - quite rarely use the module myself as it is, and certainly not for correct serialization, as also mentioned.

Edit: as a side-note, I think such option can be reasonably implemented via +1 parsing pass of PyYAML over string values serialized through this module, and if value parses back to original one - keep it, otherwise replace with a safe-style variant, or picking style by running extra dump/parse ops. This is because unquoted strings in YAML are notoriously unsafe, and it's not just 123 that'll be an issue, but 1_000, 1.123, 1e123, null, false, 2021-10-10T4:30:00, etc etc - parser knows about all these, so using it should be a good way to catch all such cases.