yaml / yaml-test-suite

Comprehensive, language independent Test Suite for YAML
MIT License
172 stars 58 forks source link

CXX2 violates YAML 1.2 grammar #36

Closed hvr closed 6 years ago

hvr commented 6 years ago

test-case CXX2 considers

--- &anchor a: b

a valid yaml-stream; but in fact, the grammar doesn't allow for block collection nodes to appear on the --- line

This also excludes the simpler

--- a: b

or also sequences such as

--- - x

The reason becomes obvious when we follow/trace the productions:

[208]  l-explicit-document  ::=  c-directives-end  l-bare-document

[207]  l-bare-document  ::=  s-l+block-node(-1,block-in)

n = -1
c = block-in

[196]  s-l+block-node(n,c)  ::=  s-l+block-in-block(n,c) | s-l+flow-in-block(n)

[198]  s-l+block-in-block(n,c)  ::=  s-l+block-scalar(n,c) | s-l+block-collection(n,c)

[200] s-l+block-collection(n,c) ::= ( s-separate(n+1,c) c-ns-properties(n+1,c) )?
                                    s-l-comments
                                    ( l+block-sequence(seq-spaces(n,c)) | l+block-mapping(n) )   

[79]    s-l-comments    ::=     ( s-b-comment | /* Start of line */ ) l-comment* 

[77]    s-b-comment     ::=     ( s-separate-in-line c-nb-comment-text? )?  b-comment

[76]  b-comment  ::=  b-non-content | /* End of file */ 

[30]  b-non-content  ::=  b-break

so the problem is that in order to match s-l+block-collection(n,c) (rule 200), we must match s-l-comments, but since we are not on column 1 (because we're 3 characters into the line due to ---), we must instead match s-b-comment; and that demands to match a line-break via b-break (or an EOF).

Hence, we cannot match a s-l+block-collection on the same line right after the directives-end marker ---.

QED :-)


PS: One obvious workaround would be to make matching s-l-comments optional; but that would make other currently disallowed cases such as

k: - x
   - y

(for which iirc there's at least one tests which tests that YAML parser refuse it as a a parsing error). I don't know if there's an easy fix to make the grammar accept CXX2 without at the same being too liberal and allowing other syntax that isn't intended to be valid.

perlpunk commented 6 years ago

looking at that, it seems you're right. thanks for listing the productions. see also the related #29 @ingydotnet @flyx makes sense? should we mark the test as an error?

hvr commented 6 years ago

Btw, what's valid however is something like

--- &foo !!seq
- x
- y

or also

--- &foo
a: b

but iirc we got tests for that...

flyx commented 6 years ago

I think I overlooked this detail until now. It's a pretty nasty gotcha that something named s-l-comments can force a newline ;).

The test input is indeed illegal according to the productions and should be an error.

perlpunk commented 6 years ago

@flyx cool. I'll modify the test to be an error.

perlpunk commented 6 years ago

thanks @hvr!