kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.53k stars 81 forks source link

Do not interpret characters that cannot be parsed in octal as int #176

Closed conao3 closed 9 months ago

conao3 commented 9 months ago

ref: #152.

Some test cases expect 1.1, even though they should use yaml 1.2 by default. Confusing, but this is the minimum patch that does not change the current behaviour. Perhaps the 1.2 syntax should be used, but that should be done in a separate PR.


ref: (yaml1.1 grammar) https://yaml.org/type/int.html https://yaml.org/type/float.html

kislyuk commented 9 months ago

Thanks!

kislyuk commented 3 months ago

@conao3 this change breaks the parsing of numbers and I am reverting it (see #187). Can you please describe how you came up with these regular expressions and how they correspond to the YAML spec and to what is used in PyYAML?

conao3 commented 3 months ago

from my comment, https://yaml.org/type/int.html has regular expressions

Resolution and Validation: Valid values must match the following regular expression, which may also be used for implicit tag resolution:

 [-+]?0b[0-1_]+ # (base 2)
|[-+]?0[0-7_]+ # (base 8)
|[-+]?(0|[1-9][0-9_]*) # (base 10)
|[-+]?0x[0-9a-fA-F_]+ # (base 16)
|[-+]?[1-9][0-9_]*(:[0-5]?[0-9])+ # (base 60)

And https://yaml.org/type/float.html has below one.

Regexp:

[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)? (base 10)
|[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+\.[0-9_]* (base 60)
|[-+]?\.(inf|Inf|INF) # (infinity)
|\.(nan|NaN|NAN) # (not a number)

And in incorporating these, I made some changes to meet all of the existing tests.

My personal opinion is that if you don't like the parsing of pyyaml, changing parser like ruamel.yaml or something else is another option. But sorry for breaking the existing use case.