kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.12k stars 62 forks source link

KDL 2.0: multiline comments in more places #342

Closed zkat closed 9 months ago

zkat commented 10 months ago

stuff like foo/*bar*/baz 1 should be legal (resolving to foobaz 1 in this case)

Ref: #270

tabatkins commented 10 months ago

Wait, you actually want the comments to be able to divide individual meaningful tokens? That seems fraught, I think.

zkat commented 10 months ago

yeah? that's what the original request was, no?

LemmaEOF commented 10 months ago

I kinda feel like this should resolve to foo baz 1 as comments are considered whitespace

dezren39 commented 10 months ago

i think this should be legal, i don't know what it should do.

pingiun commented 10 months ago

I kinda feel like this should resolve to foo baz 1 as comments are considered whitespace

100% agree. If you treat comments as whitespace that allows you to easily create tokenizing parsers. Treating comments as nothing space makes it more confusing I think.

foo/*bar */baz 1

would be different from

foo/*bar*/ baz 1

I feel like it makes sense for them to be the same

tabatkins commented 10 months ago

The initial request was for things like (type/*huh*/)10 to be valid - there, the comment is between an ident and the closing paren in the grammar, which are likely separate "things" in a parser and relatively easy to allow a comment to appear.

On the other hand, allowing comments literally anywhere and treating them as if they simply don't exist means that every single parser function has to have the ability to get interrupted and switch into a comment parse, at any point. (Or it requires a pre-parse step that removes the comments ahead of time, tho that does still require tracking where strings occur, since the comment chars are valid in strings and don't comment things.)

Allowing the initial request means it's a more difficult to represent/preserve comments in the document, versus the current definition, but just parsing and throwing them away is pretty similarly easy to today's rules. (It just means you have to check for comments in more places.)

tabatkins commented 10 months ago

Note for others: foo/**/bar is already valid KDL, and is equivalent to foo bar. Right now, block comments are treated as a type of whitespace; see the ws := ... line in the grammar.

The request here from #270 is to allow comments in some places whitespace isn't currently allowed, but still between "tokens". So (foo/**/)node or node foo/**/=2. foo/**/bar or 2/**/.5 would continue to be equivalent to foo bar and 2 .5; 2./**/5 would be invalid since 2. isn't a valid number.

Given that #340 is asking for allowing whitespace between more tokens (around the =), we might be able to resolve this solely by just allowing whitespace around the ident in a tag, so ( foo )node is allowed. Then comments would be valid by default there.

If we did that, then I think there's only one remaining spot where two adjacent tokens can't have whitespace between them, and that's the tag and node, like (foo) node is illegal today. I'm weakly against allowing whitespace there (very weakly), but would be fine with explicitly allowing block comments there regardless.

LemmaEOF commented 10 months ago

Yeah, ( foo )node should definitely be allowed. I'm somewhat in favor of (foo) node as well but I understand why separating a tag from its element could be Semantically Frustrating

zkat commented 10 months ago

I'm honestly in favor of allowing stuff like (foo) node not because I think it should be a common style, but because humans are humans and I don't want them to feel like KDL is beating them over the head with minor syntax opinions regarding whitespace, when so many other languages are so much more flexible about whitespace when they can be.

zkat commented 9 months ago

Closing in favor of https://github.com/kdl-org/kdl/issues/355 for clarity