kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.1k stars 61 forks source link

formal identifier grammar doesn't match prose #255

Closed CAD97 closed 9 months ago

CAD97 commented 2 years ago

The prose says that

Non-identifier characters

The following characters cannot be used anywhere in a bare Identifier:

  • Any codepoint with hexadecimal value 0x20 or below.
  • Any codepoint with hexadecimal value higher than 0x10FFFF.
  • Any of \/(){}<>;[]=,"

The formal grammar says that

identifier-char := unicode - linespace - [\/(){}<>;[]=,"]
linespace := newline | ws | single-line-comment

newline := See Table (All line-break white_space)
unicode-space := See Table (All White_Space unicode characters which are not `newline`)

This means that the formal grammar excludes 0x09, 0x0A, 0x0C, 0x0D, and 0x20, but allows the other codepoints < 0x20.

The prose also doesn't specify that all whitespace is non-identifier characters, though this is clearly implied, and doesn't matter as much as it does in the grammar formalization.

zkat commented 2 years ago

Happy to take a patch for this!

marrus-sh commented 2 years ago

(this is a duplicate of #191)

zkat commented 9 months ago

This should be fixed in the kdl-v2 branch