yaml / yaml-spec

YAML Specification
http://yaml.org/spec/
348 stars 53 forks source link

Wrong regex in float type tag repository page allows repeating `.` but not `_` #287

Open geekley opened 2 years ago

geekley commented 2 years ago

The page at https://yaml.org/type/float.html has something in the regex that really seems to be a mistake.

[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)? (base 10)

It matches things like 1....1. On the other hand, one of the examples below (which has a typo by the way) doesn't match that regex:

exponentioal: 685.230_15e+03

This makes me think that the repeatable . in [0-9.]* part was actually supposed to be a repeatable _. Though even with that fix, these regexes are still strange anyways, because they allow trailing _ like 0_.0.


To me, it seems that those type pages are "less canonical" in a sense. That page says it's a "working draft" for previous version 1.1, so I don't know how official it is, but it's being used as such in grammars, to recognize what regex should match a float number.

I don't know if it should be considered official because YAML spec 1.2.2 removed this part from "other schemas":

In addition, it is strongly recommended that such schemas make as much use as possible of the the YAML tag repository at /type/. This repository provides recommended global tags for increasing the portability of YAML documents between different applications.

The tag repository is intentionally left out of the scope of this specification. This allows it to evolve to better support YAML applications. Hence, developers are encouraged to submit new “universal” types to the repository. The yaml-core mailing list at http://lists.sourceforge.net/lists/listinfo/yaml-core is the preferred method for such submissions, as well as raising any questions regarding this draft.

The definitions on that float page should be fixed, or maybe updated to 1.2. And it's a bit confusing whether those type definitions are not-official/non-recommended.

What is currently recommended for text editors to highlight as literals? For example, the type page for booleans allows things like on|off, but the core schema on 1.2.2 (which doesn't seem to explicitly recommend those types) allows only variations of true and false. Some editors (like VSCode) highlight on as a literal even when used as a key, like this from GitHub Actions:

on:
  push:
    tags: [ "v*.*.*" ]
# ...

Which I understand can happen because anything can be used as a key, but it seems very strange that on: should behave like this, specially when it's not in Core Schema. And it seems even parsers disagree on how to interpret it in this case, according to this comment.

This is why I think more clarification is needed on what should be considered "recommended" or not by the spec with regards to literals and "default types". Should libraries|IDEs only use "Core Schema" (more stable) or should they consider also those type definitions (maybe outdated/draft and prone to errors)?

perlpunk commented 2 years ago

The regular expressions for YAML 1.1 have many issues, and this is one of the reasons why YAML 1.2 came out in 2009. https://yaml.org/spec/1.2.2/ The Core Schema of YAML 1.2 is the recommended schema which YAML processors should use as a default, and this is mentioned in the 1.2 spec. The link to the 1.1 drafts have been left in the 1.2.1 spec accidentally, and this was corrected in 1.2.2, as you noticed. All the regular expressions you need are in the 1.2.2 spec, so I don't know why we should update the 1.1 drafts.

perlpunk commented 2 years ago

@ingydotnet how about adding some prominent short explanation and link to the 1.2 spec on all the /type(...) pages? If people land on those pages from somewhere else, they might not recognize that it does not belong to the latest spec.

geekley commented 2 years ago

The Core Schema of YAML 1.2 is the recommended schema which YAML processors should use as a default, and this is mentioned in the 1.2 spec.

Thanks for the clarification!

adding some prominent short explanation and link to the 1.2 spec on all the /type(...) pages

I agree. I think the source of confusion is the fact that the specs have the version in the URL, but those type pages don't, making it seem that the version they have is the latest for "their spec" (which was once kinda separate). I suggest those pages be moved, for example, https://yaml.org/type/1.1/float.html or similar, and the version-less type pages link v1.1 to those and v1.2/latest directly to core schema section of 1.2.2 spec. This would clarify that those type pages are now considered obsolete, and show where the corresponding regexes are now found in the spec.