yaml / yaml-spec

YAML Specification
http://yaml.org/spec/
348 stars 53 forks source link

YAML 1.1 !int regexes allow nonsense "numbers" #338

Open KJTsanaktsidis opened 2 months ago

KJTsanaktsidis commented 2 months ago

This was raised as an issue with Psych (https://github.com/ruby/psych/pull/687), but I guess this is actually a specification issue so I thought I'd bring it up here as well.

The YAML 1.1 !int type says that:

Valid values must match the following regular expression, which may also be used for implicit tag resolution:

[-+]?0b[0-1_]+ # (base 2)
|[-+]?0[0-7_]+ # (base 8)
|[-+]?(0|[1-9][0-9_]*) # (base 10)
|[-+]?0x[0-9a-fA-F_]+ # (base 16)
|[-+]?[1-9][0-9_]*(:[0-5]?[0-9])+ # (base 60)

However, the base2, 8, and 16 regexes don't actually enforce that there are any digits in the number! i.e. these should all be parsed as numeric literals in plain-style, allegedly:

However, they of course don't actually represent a number at all.

I gather this isn't a problem in YAML 1.2 because _ isn't allowed in numeric literals there at all. But should there be some kind of errata published for the YAML 1.1 !int type to tighten up these regular expressions?

@zendesk-jmeade suggested in the linked Psych issue:

[-+]?0b[_]*[0-1][0-1_]*             # base 2
|[-+]?0[_]*[0-7][0-7_]*              # base 8
|[-+]?(0|[1-9][0-9_]*)               # base 10
|[-+]?0x[_]*[0-9a-fA-F][0-9a-fA-F_]* # base 16)