kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.09k stars 61 forks source link

Spec is unclear about how slashdash comments work #284

Closed CAD97 closed 8 months ago

CAD97 commented 1 year ago

Consider

node/-"val""val"
node /-{} "val"
node /- /-{} {}

kdl-rs accepts this, but the formal grammar forbids all three nodes here.

EDIT: preferred resolution is now #285.

Original suggestion My preferred resolution at the moment[^1] is to make the implementation and prose match the formal grammar here, with one small change: require whitespace before a slashdashed children block as well. (The lack of such causes implementation issues[^2].) [^1]: And what I'm implementing for the time being in my annoyingly-zero-copy parse/visitor. Don't let me strongarm you, though; I've figured out a reasonable (though kind of annoying) way to implement the reference implementation's behavior, and that's the most interesting option to implement in my LL(2) recursive descent parse/visitor. In fact, most of the implementation annoyance is still there anyway in order to support a non-spaced non-slashed children block while requiring spacing for everything else. [^2]: Namely: the grammar is no longer LL(n), as it requires unbounded *token* lookahead when seeing a slashdash (past arbitrarily many escaped lines) to determine if it's a slashdashed argument/property (error) or a slashdashed children block (allowed). This arbitrary lookahead isn't just an implementation concern, either; it means that the location of the produced error is potentially far from the cause of the error. Suggested wording: In Argument/Property, delete the references to `/-`. The Children Block's prose does not currently allow for it to be slashdashed, despite this being allowed in the formal grammar and reference implementation. In Node: > Nodes *MAY* be prefixed with `/-` to "comment out" the entire node, including its properties, arguments, and children, making it act as whitespace, even if it spreads across multiple lines. Additionally, the node's properties, arguments, and children block *MAY* themselves be prefixed with `/-` to "comment out" that entire component. If the children block is "commented out" in this fashion, it *MUST* be separated by whitespace or a slash-escaped line continuation. Even if a component is "commented out" in this fashion, it still acts as the commented out component *syntactically*. (For example, "commented out" arguments and properties must still be separated by plain whitespace, and if a "commented out" children is present, it must be after all arguments and properties, and there may not be another children block, whether "commented out" or not.) In the formal grammar: > ``` > node := ('/-' node-space*)? type? identifier node-prop-or-arg* node-children? node-space* node-terminator > node-prop-or-arg := node-space+ ('/-' node-space*)? (prop | value) > node-children := (node-space+ '/-') node-space* '{' nodes '}' > ```
CAD97 commented 1 year ago

(Actually, since I wrote this while working on handling this in my implementation, I'm not 100% sure what my implementation is going to end up doing, but just a little more work has suggested that aligning with kdl-rs's current behavior will actually be easier than my desired semantics. I still desire them, though; it'd just be significantly easier if space was required before children blocks as well.)

zkat commented 1 year ago

Hm. It does feel like that second one should be legal, though, doesn't it? I wonder if slashdashed children should be allowed to be interspersed in a node's arguments...

zkat commented 1 year ago

that said, we should be careful/conservative here, because any changes to the spec (especially the grammar) are semver-breaking and would fall into the kdl 2.0 bucket.

CAD97 commented 1 year ago

If we want to instead make the formal grammar match the prose & implementation, that would be

nodes := (line-space* node)* line-space*

line-space := newline | ws | single-line-comment | '/-' node-space* node
node-space := ws* escline ws* | ws+ | '/-' node-space* (node-prop-or-arg | node-children)

node := type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children)? node-space* node-terminator
node-prop-or-arg := prop | value
node-children := '{' nodes '}'

I believe this should be just a fix to the formal grammar to match the intended prose semantics, and after a bit more impl work I don't mind that.

Relaxing the MUST for node prop-or-arg's separating whitespace to a SHOULD should be a minor addition, though; all "KDL 1.0" will still be accepted, it's just "KDL 1.1" is a small superset that's not fully portable (and only for KDL that shouldn't really be being written anyway). Formally, that just changes the node-space+ to node-space*.

CAD97 commented 1 year ago

Given that the formal grammar is authoritative here,

Also, new fun example:

node /- /-{} {}
zkat commented 1 year ago

kdl-rs is not a reference implementation. The grammar in the spec is the ultimate authority for implementations, along with the corresponding test suite that helps confirm compliance.

That said, tbh, this is kind of a corner case, and I think I'm gonna start getting everyone together to do a KDL 2.0 soon, so I would just implement to that so you're ahead of the game. KDL is low-usage enough that we can have a little wiggle room on the margins.

CAD97 commented 1 year ago

Another interesting edge case: line continuations don't allow slashdash-escaped arguments/properties, even in the kdl-rs implementation. This is a point against them being "just plain whitespace."

larsgw commented 1 year ago

It's been a while for me but I think that might be intentional though, the idea being that without the /- the syntax should still be valid (which wouldn't be the case of a slashdash-escaped argument occurred after a line continuation \).

Edit: Oh I see which part of the spec you mean now. I guess in terms of syntax & parsing they're not plain whitespace, but in the resulting document model they are?

CAD97 commented 1 year ago

without the /- the syntax should still be valid

If this is still the intent, then #285 (which makes the spec match kdl-rs's implementation in treating /- as node-space (but not ws)) is the wrong resolution to this issue, and kdl-rs should be fixed to match the formal grammar.

larsgw commented 1 year ago

I'd have to check, I might have missed or misremembered something.

zkat commented 8 months ago

The fixes for this have been merged into the kdl-v2 branch