google / yamlfmt

An extensible command line tool or library to format yaml files.
Apache License 2.0
1.13k stars 43 forks source link

Folded block scalars with whitespace at the end causes problems #86

Open braydonk opened 1 year ago

braydonk commented 1 year ago

While investigating #84 I realized that the yaml library parses weirdly when there is whitespace at the end of lines in a folded block scalar.

When scan_folded_as_literal: false, you get the original bug shown in issue #84.

When scan_folded_as_literal: true, you get the following with the same input:

Foobar:
  baz: "Lorem Ipsum is simply dummy text of the printing_and_typesetting industry.
    \n#magic___^_^___line\n"
Foobaz:
  baz: "foobar"

Will need to figure out why whitespace at the end of the line causes the library to think it's not printable.

braydonk commented 1 year ago

This is probably going to be challenging, so I'm going to push it out compared to the other easy stuff I've got slated for v0.8.0

braydonk commented 7 months ago

Realized I never wrote the explanation for why this hasn't been resolved yet.

This is caused by the hack around the fact that yaml.v3 doesn't retain plain line-break information. yamlfmt will insert a magic string before being serialized into yaml.v3's node structure. Then after the new output is produced, we use the magic strings to put them back in. However, one place that newline information will actually be retained is in block scalars. So the magic line string being thrown on there messes with the serialization. I can't think of a way around this without somehow getting rid of the hack. I've tried in the past to fix this in yaml.v3 but I came up short. I've been wanting to build my own yaml parser instead to rid myself of yaml.v3 in general, and in that case this and many other fixes/features I've wanted to implement would be possible. Haven't had the time to invest to make that happen though.

juliusl commented 2 months ago

@braydonk naive question, but does yaml.v3 still not retain plain line-break information if the newLineStr is "\r\n" instead of "\n"?

braydonk commented 2 months ago

I am pretty sure it will not, though I'd be pleasantly surprised to be wrong cause that would be a glimmer of hope. Been a while since the last time I looked at it, but the problem iirc is that the AST doesn't maintain empty newline information. So in the manner that yamlfmt operates, serialize the yaml document into the yaml.Node representation from the library, and then feeding that into the library's emitter, there is no newline information in the AST to re-emit as the data has already been lost in the serialization. This is why the yamlfmt solution is to insert a magic string that is maintained throughout the process and then replaced after the new output is created.