Open camilstaps opened 2 years ago
I recall looking into something like this where the presence of a final newline might affect the meaning.
I am of the firm opinion that the presence or absence of a final line break should not ever change the meaning of a yaml stream.
That said, I'm not 100% sure the 1.2.2 spec enforces that.
All the implementations we regularly test agree on the meaning as seen here: https://play.yaml.io/main/parser?input=Zm9vOiB8CiAgeAogIA==
This should probably be addressed as 1.2.2 errata (if it shown to be misrepresented in the productions) and fixed in 1.2.3.
Looking at the spec link you provided:
b-chomped-last(CLIP) ::= [b-as-line-feed](https://yaml.org/spec/1.2.2/#rule-b-as-line-feed) | <end-of-input>
would match here and CLIP is explained as
Clipping is the default behavior used if no explicit chomping indicator is specified. In this case, the final line break character is preserved in the scalar’s content. However, any trailing empty lines are excluded from the scalar’s content.
In my opinion, this explanation is assuming that all lines end with a line break. This should be explicitly made clear in that explanation.
I'd go forward with handling all lines as if they end with a break.
Looking forward to your implementation, Ingy
Thanks a lot!
In my opinion, this explanation is assuming that all lines end with a line break.
This makes sense to me.
Could you clarify a bit where you plan to fix this? At first I thought I could just interpret end-of-line as \n
in b-chomped-last(KEEP|CLIP)
. But this is not enough for JEF9/02. Trying to follow the spec by hand, I end up in l-literal-content(3,KEEP)
. Then the first optional part of l-literal-content
does not apply, because l-nb-literal-text
requires nb-char+
and the spaces are all indentation. So l-chomped-empty(3,KEEP)
should match ␣␣␣∎
. This boils down to l-empty(3,BLOCK_IN)*
. But this rule ends with a b-as-line-feed
which is not in the input.
Alternatively I suppose you could let b-as-line-feed
and perhaps b-non-content
accept end-of-input
. In my implementation this leads to infinite recursion. I assume that this is because there is an environment in which the end-of-input
is then quantified over with *
or +
so that the parser does not move forward. Then this may just be a problem of my implementation and could possibly be fixed by requiring the parser to move forward in *
/+
.
Or yet another way could be to accept end-of-input
next to b-as-line-feed
in l-empty(n,KEEP)
.
As my implementation uses parser combinators it follows the spec very closely, so knowing which way you're going would help to keep the two in sync. But there is no urgency on my side so I'm also perfectly happy to wait for an erratum / change for 1.2.3.
I'm looking at an issue of my parser with L24T/01, added in #105. My parser gives back a string without trailing newline.
If I look at L24T/01/in.yaml in the data-2022-01-17 tag, the file does not have a trailing newline:
If my understanding of the spec is correct,
b-chomped-last
would then use the alternativeb-as-line-feed
rather than<end-of-input>
, so there should be no trailing newline parsed. Is this correct? Should the file be generated with a trailing newline, or should out.yaml be adapted? Or is my understanding of the spec incorrect?I have the same issue with JEF9/02, added in #90.
Pinging @ingydotnet & @perlpunk because you were involved in the PR that added this test.