kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.1k stars 61 forks source link

The grammar does not exclude some codepoints below 0x20 within a bare identifier #191

Closed uasi closed 2 years ago

uasi commented 2 years ago

The spec reads:

The following characters cannot be used anywhere in a bare Identifier:

  • Any codepoint with hexadecimal value 0x20 or below.
  • (snip)

while idenftifier-char is defined as follows:

identifier-char := unicode - linespace - [\/(){}<>;[]=,"]
linespace := newline | ws | single-line-comment
newline := See Table (All line-break white_space)
ws := bom | unicode-space | multi-line-comment
unicode-space := See Table (All White_Space unicode characters which are not `newline`)

unicode isn't defined in the grammar but presumably [\x00-\u{10FFFF}]. [\x00-\x08\x0E-\x1F] are not White_Space nor in the table, so identifier-char matches them.

zkat commented 2 years ago

Closing this as a duplicate of https://github.com/kdl-org/kdl/issues/255