mikefarah / yq

yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
https://mikefarah.gitbook.io/yq/
MIT License
11.64k stars 579 forks source link

Multiline strings (or block scalars) are not preserved when they have trailing white space. #566

Open bryant-ferguson opened 3 years ago

bryant-ferguson commented 3 years ago

Describe the bug Multiline strings (or block scalars) are not preserved when they have trailing white space. They are converted to quoted strings.

version of yq: 3.3.4 operating system: Linux Mint 19

Input Yaml

this: |
  should 
  really work

Command The command you ran:

yq merge data.yml

Actual behavior

this: "should \nreally work\n"

Expected behavior

this: |
  should 
  really work
Orrimp commented 1 year ago

We have the same issue and its important for us because it destroys the configuration.

arikkfir commented 12 months ago

Same here - is this something to be expected in yq?

arikkfir commented 12 months ago

Ok just discovered this is a duplicate of #1681 and #1277, which both depend on go-yaml/yaml#880

@mikefarah perhaps it would be best to close #1681 and #1277 as duplicates of this issue (since this one is the oldest)

rattboi commented 11 months ago

I spent some time investigating this yesterday, and found the related code within go-yaml/yaml that could potentially be changed to fix this. However, I also see that go-yaml/yaml currently has 112 PRs that have gone unreviewed, most without comment.

I almost spent time working out a fix in the upstream library, but I'm wondering if @mikefarah would be open to another direction, i.e. using https://github.com/goccy/go-yaml/ for more of the standard parsing / lexing / encoding / decoding within yq. I see it's already in use for one specific use-case (colorized output), but it could likely be used to replace all go-yaml/yaml.

It may be an undertaking, but as upstream of go-yaml/yaml seems effectively dead, and benchmarks show https://github.com/goccy/go-yaml/ as being generally 2x faster in operations, it seems like a potential solution for these yaml-decoding issues that are blocked by upstream.

There's a simple ycat command bundled with goccy/go-yaml, but it was enough to see how it would handle literal scalars with trailing spaces:

echo -ne 'foo: |\n  bar:2    \n  baz:3\n  #foo' | go run ycat.go /dev/stdin
 1 | foo: |
 2 |   bar:2    
 3 |   baz:3
 4 |   #foo
bradonkanyid commented 11 months ago

fwiw, these are the bits of the code in go-yaml/yaml that are involved in this decision

Determining if there are trailing spaces in a scalar: https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1335-L1346

                if is_space(value, i) {
                        if i == 0 {
                                leading_space = true
                        }
                        if i+width(value[i]) == len(value) {
                                trailing_space = true
                        }
                        ...
                }
                ...

Disabling various ways to render based on trailing spaces being identified: https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1375-L1381

        if leading_space || leading_break || trailing_space || trailing_break {
                emitter.scalar_data.flow_plain_allowed = false
                emitter.scalar_data.block_plain_allowed = false
        }
        if trailing_space {
                emitter.scalar_data.block_allowed = false
        }

The actual decision to render in double-quoted scalar format: https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1039-L1043

        if style == yaml_LITERAL_SCALAR_STYLE || style == yaml_FOLDED_SCALAR_STYLE {
                if !emitter.scalar_data.block_allowed || emitter.flow_level > 0 || emitter.simple_key_context {
                        style = yaml_DOUBLE_QUOTED_SCALAR_STYLE
                }
        }