macisamuele / language-formatters-pre-commit-hooks

Collection of custom pre-commit hooks.
Apache License 2.0
119 stars 58 forks source link

Formatting YAML files converts boolean-ish strings to booleans #167

Closed datalogics-kam closed 1 year ago

datalogics-kam commented 1 year ago

Thanks for fixing up this formatter! I'm putting the fix from #154 into effect, and found out why .pre-commit-config.yaml was excluded from formatting.

The formatter is converting certain strings to boolean values. For instance, the input

args: [--wrap, 'no', number]

produces the output

args: [--wrap, no, number]

The quotes were there to prevent 'no' from being parsed as a boolean. This causes an error after formatting certain .pre-commit-config.yaml files that call the mdformat pre-commit hook with those args. Attempts to further use that .pre-commit-config.yaml file result in Expected string got bool.

I believe that this would be corrected by using round_trip_load and round_trip_dump on the YAML data here. Those calls are designed to properly handle a round trip from YAML to an internal format and back to YAML, which is this formatter is doing.

The problem and solution is explained by the author of reumel.yaml in this Stack Overflow answer.

macisamuele commented 1 year ago

@datalogics-kam thanks for the report.

I verified the assumptions and yes, depending on the yaml version in use some strings might represent a boolean value or a text (https://yaml.org/type/bool.html)

I do believe that I can try to look into it in the next days, but I cannot guarantee a short turn-around.

Ideally, we should aim to preserve the usage of ruamel.yaml.YAML methods instead of ruamel.yaml.round_trip_* as the later are deprecated (here).


In the meantime, --preserve-quotes could be used (especially if you are experiencing the issue) the tool from removing the explicit strings delimiters.

(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ cat a.yaml
f1: 'on'
f2: 'off'
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ .tox/py/bin/python -m language_formatters_pre_commit_hooks.pretty_format_yaml a.yaml --preserve-quotes
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ cat a.yaml
f1: 'on'
f2: 'off'
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ .tox/py/bin/python -m language_formatters_pre_commit_hooks.pretty_format_yaml a.yaml                  
File a.yaml is not pretty-formatted
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ .tox/py/bin/python -m language_formatters_pre_commit_hooks.pretty_format_yaml a.yaml --autofix        
File a.yaml is not pretty-formatted
Fixing file a.yaml
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $ cat a.yaml
f1: on
f2: off
(py) maci:language-formatters-pre-commit-hooks/ (tmp✗) $

I believe that this would be corrected by using round_trip_load and round_trip_dump on the YAML data here

I quickly tried to verify the assumptions and it does not look like it would work, except if we ensure that quotes are preserved

>>> import ruamel.yaml as y
>>> string = "f1: 'on'\nf2: 'off'"
>>> y.round_trip_dump(y.round_trip_load(string))
'f1: on\nf2: off\n'
>>> y.round_trip_dump(y.round_trip_load(string, preserve_quotes=False))
'f1: on\nf2: off\n'
>>> y.round_trip_dump(y.round_trip_load(string, preserve_quotes=True))
"f1: 'on'\nf2: 'off'\n"
datalogics-kam commented 1 year ago

Thanks for the update. I did some experiments in the meantime, and yes, I see that preserving quotes will also be necessary. I'll try that.

macisamuele commented 1 year ago

Closing this issue as no engagement is present and workaround is provided