kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.14k stars 62 forks source link

unusual_chars_in_bare_id.kdl and spec disagree on valid identifiers #166

Closed danderson closed 3 years ago

danderson commented 3 years ago

While implementing a Go lexer for KDL, I hit a test failure that I think is an inconsistency between the spec and the test input https://github.com/kdl-org/kdl/blob/main/tests/test_cases/input/unusual_chars_in_bare_id.kdl .

The input:

foo123~!@#$%^&*.:'|/?+ "weeee"

The expected result is a valid bare identifier followed by a string, as indicated in https://github.com/kdl-org/kdl/blob/main/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl . What my lexer gets is:

Identifier ("foo123~!@#$%^&*.:'|")
Err (unknown kind of comment "/?")

This difference occurs because I lex bare identifiers according to https://github.com/kdl-org/kdl/blob/main/SPEC.md#identifier , which states that non-identifier characters cannot appear anywhere in a bare identifier, and https://github.com/kdl-org/kdl/blob/main/SPEC.md#non-identifier-characters says that / is a non-identifier character. Thus, my lexer emits an identifier when it sees the / (since the identifier is clearly over), then tries to continue lexing from there as a comment and fails because /? isn't a known comment form.

(ignore for a moment that this set of lexer tokens isn't valid KDL, my parser would have picked that up, the lexer is less clever :) )

Am I misreading the spec, or does this test input indeed contain an invalid identifier?

danderson commented 3 years ago

Code for my lexer is at https://github.com/danderson/go-kdl . Please excuse the state of the code, I've been getting it to lex correctly before tidying it up. go test -v -run=TestConformance/testdata/valid/unusual_chars_in_bare_id.kdl . reproduces the lexing failure I listed above.

zkat commented 3 years ago

oh yeah, that test looks wrong. Would you mind updating it to exclude characters based on the actual spec? The identifier grammar changed a lot towards the end, and the tests and examples were clearly not updated for it.

zkat commented 3 years ago

(To be clear: / is NOT a valid identifier char)

danderson commented 3 years ago

Thanks for the confirmation, I'll send a PR to adjust the tests.