S98Z ought to be an error under YAML 1.2

hvr commented 6 years ago

The test-case S98Z contains the YAML document

empty block scalar: >
<SPC>
<SPC><SPC>
<SPC><SPC><SPC>
 # comment

and claims this to be semantically equivalent to

{ "empty block scalar": "" }

However, according to

8.1.1.1. Block Indentation Indicator

Typically, the indentation level of a block scalar is detected from its first non-empty line. It is an error for any of the leading empty lines to contain more spaces than the first non-empty line.

Detection fails when the first non-empty line contains leading content space characters. Content may safely start with a tab or a “#” character.

...

And in Example 8.2. Block Indentation Indicator

- >
·
··
··# detected

is given as a positive example with an inferred indentation level 2; as well as a negative example in Example 8.3. Invalid Block Scalar Indentation Indicators:

- |
··
·text

which expects a parser failure (A leading all-space line must not have too many spaces.)

In order to detect the indentation level n of a block scalar which wasn't manually specified, we need to find the first non-empty line. And 8.1.1.1 is quite clear that Content may safely start with a tab or a “#” character., so a line starting with a # can be considered content (as it does in exmaple 8.2), and thus qualifies as first non-empty line.

The yaml stream in S98Z however appears to consider the # comment line an actual comment even though according to section 8.1.1.1 it is in fact the first non-empty line, and thus to be considered as content of the block scalar.

Moreover, since the preceding all-space lines contain more spaces (i.e. 3) than the inferred n-level (i.e. 2), and consequently an error along the lines of "A leading all-space line must not have too many spaces." is warranted.

perlpunk commented 4 years ago

We discussed this on IRC, but I never commented here. Yeah, I agree now it should be an error according to the spec.

perlpunk commented 4 years ago

It is now an error, created new release 2020-02-11

am11 commented 4 years ago

I think 5LLU is also covering this case, with the exception that we emit =VAL > token. While writing a very strange rule (empty vs. non-empty where empty means literally empty or comments-only) I realized that I should confirm if it is unintentional?

perlpunk commented 4 years ago

@am11 The difference to 5LLU is, that in S98Z it is # comment, which is kind of special, since it could be seen as a real comment, and the block scalar as completely empty.

am11 commented 4 years ago

@perlpunk, thanks. I agree that keeping this test has a valid significance. We now consider S98Z as error because comment is classified as content (per the spec). 👍

However, since it is semantically equivalent to 5LLU, would it make sense to include =VAL > in expected events or exclude it from 5LLU's expected event stream?

perlpunk commented 4 years ago

@am11 oh I see what you mean. Yes, it should be excluded. But the test events in case of an error are not supposed to be very important. Depending on the implementation, different processors might output more or less events. I think @ingydotnet would even like to remove the test events for errors completely.

am11 commented 4 years ago

remove the test events for errors completely

My reaction was: please reconsider 😁

We pedantically matched the output of spec suite in error cases as well and improved our implementation by doing that. We are down to the last two desync cases. :)

ps - I was rather thinking about writing a proposal to standardize the error messages as well, so implementations can assert that they are failing for same reason which was envisioned while writing the test case, and not by a side-effect; in order to gain (more) confidence. We did similar thing over at Sass lang for errors and warnings. (e.g. https://github.com/sass/sass-spec/blob/87e9b70ff0fb3cbf6bc437243becb96ac2d17171/spec/css/moz_document/functions/static.hrx#L65).

perlpunk commented 4 years ago

We pedantically matched the output of spec suite in error cases as well and improved our implementation by doing that.

Ah, that's great to hear it's useful! Yeah, I would also vote on keep.

perlpunk commented 4 years ago

ps - I was rather thinking about writing a proposal to standardize the error messages as well

That's ambitious, but sure, that would be great.

yaml / yaml-test-suite

S98Z ought to be an error under YAML 1.2 #37

8.1.1.1. Block Indentation Indicator