mikefarah / yq

yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
https://mikefarah.gitbook.io/yq/
MIT License
12.22k stars 598 forks source link

yq write strips completely blank lines from the output #515

Open scanfield opened 4 years ago

scanfield commented 4 years ago

Is your feature request related to a problem? Please describe.

foo:
  bar: 1

  baz: 2

when run through yq w - foo.baz 3

produces

foo:
  bar: 1
  baz: 3

Describe the solution you'd like Keep my extra blank line (it's better for readability / produces less of a diff)

warder commented 4 years ago

Same story, sorry but this issue looks more like a bug nether then enhancement. When you process yaml file with yq it corrupts a whole file

AceHack commented 4 years ago

Any update on this, this is a really nice feature.

sathiyams commented 3 years ago

Any update on this ? It's getting difficult when it comes to readability

mikefarah commented 3 years ago

This is an effect of the underlying yaml parser, an issue was raised there https://github.com/go-yaml/yaml/issues/627 - the owner said

..the content when re-encoded will not
 have its original textual representation preserved. An effort is made to
 render the data plesantly, and to preserve comments near the data they
 describe, though. 
arcesino commented 3 years ago

I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using diff & patch commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:

doc:
  version: 1.0.0
  name: numbers & letters

numbers:
  - 1

letters:
  - a

we call this file a.yaml. Now let's update the version using yq and store the result in new file a-updated.yaml:

yq e '.doc.version = "1.0.1"' a.yaml > a-updated.yaml

as expected, command above stripped all blank lines so a-updated.yaml looks like:

doc:
  version: 1.0.1
  name: numbers & letters
numbers:
  - 1
letters:
  - a

at this point, the first step to get the blank lines back is to create a diff file that ignores blank lines changes:

diff -U0 -w -b --ignore-blank-lines a.yaml a-updated.yaml > a.diff

a.diff looks like this:

--- a.yaml  2021-04-30 15:28:38.000000000 -0500
+++ a-updated.yaml  2021-04-30 15:18:53.000000000 -0500
@@ -2 +2 @@
-  version: 1.0.0
+  version: 1.0.1

then final step is to patch original file with the diff:

patch a.yaml < a.diff

after that, the original file looks like:

doc:
  version: 1.0.1
  name: numbers & letters

numbers:
  - 1

letters:
  - a

the issue comes when the updated line is right before a blank line. For example, let's add an element to one of the arrays:

yq e '.numbers += 2' a.yaml > a-updated.yaml

the updated file is now:

doc:
  version: 1.0.1
  name: numbers & letters
numbers:
  - 1
  - 2
letters:
  - a

if we generate the diff file as before we'll get the following:

--- a.yaml  2021-04-30 15:30:22.000000000 -0500
+++ a-updated.yaml  2021-04-30 15:35:26.000000000 -0500
@@ -7 +6 @@
-
+  - 2

and patching the original file with diff above results in:

doc:
  version: 1.0.1
  name: numbers & letters

numbers:
  - 1
  - 2
letters:
  - a

notice how the blank line after the new element in numbers array remains stripped while others are back. This is due since the diff file considers the blank line deletion and the addition of the new array element as part of the same diffset so it's not ignored by --ignore-blank-lines.

This is not ideal in any means but in my case it has helped a lot since my files are big and with lots of blank lines. I'm sharing this in case someone else can find it useful too.

lirlia commented 2 years ago

Thanks ! I use @arcesino approach for this 1 liner.

filename=xxx
version=xxx

patch "$filename" <<< $(diff -U0 -w -b --ignore-blank-lines $filename <(yq eval ".my.version = \"$version\"" $filename))
vladimir259 commented 2 years ago

Thanks for the idea with diff & patch @arcesino .

I my case the removal of blanks introduced by diff were unfortunately unacceptable, so i had to dig further.

And found a solution.

Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.

Here an example:

Starting point is my original yaml where the value of key "secrets.TEST" should be updated

---
config:

  # mysql
  DATABASE_PROTOCOL: "mysql"
  # instance fqdn
  DATABASE_HOST: "mysql"

secrets:
  # db password
  DATABASE_PASSWORD: "password"

  # example
  TEST: "foo"

# other values
#[...]

Step 1: updating the value & creating a copy

yq '.secrets.TEST = "NewValue"' sample.yaml > sample.yaml.new

Step 2: removing blanks from the original

yq '.' sample.yaml > sample.yaml.noblanks

Step 3: creating a patch

diff -B sample.yaml.noblanks sample.yaml.new > patch.file

the patch contains then only the value diffs:

$> cat patch.file
11c11
<   TEST: "foo"
---
>   TEST: "NewValue"

Step 4: apply the patch to the original

patch sample.yaml patch.file

Here a screenshot:

image

Utils used:

OS: debian 11

clementnuss commented 2 years ago

good idea! I turned that in a fish and bash functions in this Gist:

#fish
function yqblank;
  yq eval "$argv[1]" "$argv[2]" | diff -B "$argv[2]" - | patch "$argv[2]" -o -
end

#bash
yqblank() {
  yq eval $1 $2 | diff -B $2 - | patch $2 -o -
}

this makes it possible to use yq without changing (most) of the blank lines. usage as follows:

yqblank '.' file_name.yml
raQai commented 2 years ago

@clementnuss I think patch $2 -o - does not work and -o should be removed there.

#bash
yqblank() {
  yq eval $1 $2 | diff -B $2 - | patch $2 -
}
ryenus commented 2 years ago

@clementnuss I think patch $2 -o - does not work and -o should be removed there.

@raQai, thank you! Just that the arguments have to be quoted properly, also eval/e can be omitted since yq 4.18.1:

#bash
yqblank() {
  yq "$1" "$2" | diff -B "$2" - | patch "$2" -
}
raQai commented 2 years ago

Oh yeah, I forgot about the quote part :sweat_smile: Was on a hurry so thanks for adding this :+1:

edit: I would also like to add, that this still sometimes merges multi line descriptions and arrays into one and it is not able to properly handle comments.

source:
  fruits: [
    Apple,
    Banana,
    Calamansi,
  ]
becomes:
  fruits: [Apple, Banana, Calamansi,]
source:
  fruits: [
    Apple,     # comment 1
    Banana,    # comment 2
    Calamansi, # comment 3
  ]
becomes:
  fruits: [
    Apple, # comment 1
    Banana, # comment 2
    Calamansi, # comment 3
  ]

(I did not verify this on my current machine but that was roughly the result)

edit2: @arcesino we also ran into the same thing you did with the .info.version update.

Long story short: We still use yq but only to get the line of the .info.version using the line operator and update it using sed.

Something along those lines should work

$ sed -i "$(yq '.info.version | line' "$file")s/$old_val/$new_val/" "$file"

This also returns the correct line if the value of .info.version is broken to the next line

info:
  version: 1.x.x # line 2
info:
  version:
    1.x.x # line 3
msdobrescu commented 2 years ago

I'm hit by this too. No fix, only workarounds?

andry81 commented 2 years ago

Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.

Unfortunately this only works for changes in already existed values. The patch would be with offsetted blank lines if try to add lines to the yaml.

I've already tested that and it does not work as expected for additions: https://github.com/andry81-devops/gh-workflow/blob/ee5d2d5b6bf59299e39baa16bb85357cf34a8561/bash/github/init-yq-workflow.sh https://github.com/andry81-devops/gh-workflow/blob/9b9d01a9b60a65d6c3c29f5b4b200409fc6a0aed/bash/cache/accum-content.sh

Search for: yq_edit, yq_diff, yq_patch

So, only the diff-versus-edited-yaml instead of diff-versus-unblanked-yaml looks reliable as @arcesino showed.

andry81 commented 2 years ago

@arcesino

I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using diff & patch commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:

This one has one disadvantage, it does remove comments. And there is no any way to completely correctly retain comments outside the yq utility, because the comments format depends on yaml syntax.

andry81 commented 2 years ago

I've new implementation of bash scripts which is better of all above.

Implementation: https://github.com/andry81-devops/gh-workflow/blob/master/bash/github/init-yq-workflow.sh Example of usage: https://github.com/andry81-devops/gh-workflow/blob/master/bash/cache/accum-content.sh

# Usage example:
#
>yq_edit '<prefix-name>' 'edit' "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
  <list-of-yq-eval-strings> && \
  yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
  yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
#   <input-yaml>  - input yaml file path
#   <output-yaml> - output yaml file path
#
#   <output-yaml-edited>          - output file name of edited yaml
#   <output-diff-edited>          - output file name of difference file generated from edited yaml
#   <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
#   <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file

Example with test.yml:

# This file is automatically generated
#

content-index:

  timestamp: 1970-01-01T00:00:00Z

  entries:

    - dirs:

        - dir: dir-1/dir-2

          files:

            - file: file-1.dat
              md5-hash:
              timestamp: 1970-01-01T00:00:00Z

            - file: file-2.dat
              md5-hash:
              timestamp:

            - file: file-3.dat
              md5-hash:
              timestamp:

        - dir: dir-1/dir-2/dir-3

          files:

            - file: file-1.dat
              md5-hash:
              timestamp:

            - file: file-2.dat
              md5-hash:
              timestamp:
export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow

source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"

[[ -d "./temp" ]] || mkdir "./temp"

export TEMP_DIR="./temp"

yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
  ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
  yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
  yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?

PROs:

CONs:

Related issues:

alexklibisz commented 1 year ago

Here is another possible workaround. We basically pre-format the file once with no content changes. Then make the content change. Then compare the pre-formatted and the content-changed versions to get a patch. Then apply the patch to the original file. I've only tried it for simple cases like patching the version in a helm values file. It seems to work well, and also seems to preserve comments.

$ yq --version
yq version 4.9.8
$ # The original file
$ cat values.yaml
# The app name
name: "some-app"

image:
  # The image tag
  tag: "1.2.0"

# Some other comments...
# ...
$ # Don't change anything; just let yq do its default formatting
$ yq eval --exit-status '.' values.yaml | tee out1.yaml
# The app name
name: "some-app"
image:
  # The image tag
  tag: "1.2.0"

# Some other comments...
# ...
$ # Now make the actual change
$ yq eval --exit-status '.image.tag = "1.3.0"' values.yaml | tee out2.yaml
# The app name
name: "some-app"
image:
  # The image tag
  tag: "1.3.0"

# Some other comments...
# ...
$ # Diff the two stripped files to get a minimal diff with no special flags.
$ diff out1.yaml out2.yaml | tee out.patch
5c5
<   tag: "1.2.0"
---
>   tag: "1.3.0"
$ # Apply the patch to the original file, which was unchanged so far.
$ patch values.yaml < out.patch
patching file values.yaml
$ # Inspect the final file. 
$ # Note the version was changed and everything else remained the same.
$ cat values.yaml
# The app name
name: "some-app"

image:
  # The image tag
  tag: "1.3.0"

# Some other comments...
# ...
andry81 commented 1 year ago

Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.

It has the same issues with comments and blanks remove.

alexklibisz commented 1 year ago

Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.

It has the same issues with comments remove.

I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.

andry81 commented 1 year ago

I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.

The diff shows position in already edited file:

3c3 means change in 3d line, when actually has changed 6th line:

1: # The app name
2: name: "some-app"
3: 
4: image:
5:   # The image tag
6:   tag: "1.2.0"

Better to use uniform diff to see:

> diff -u out1.yaml out2.yaml | tee out-uniform.patch
--- out1.yaml
+++ out2.yaml
@@ -1,3 +1,3 @@
 name: some-app
 image:
-  tag: "1.2.0"
+  tag: "1.3.0"

To exploit:

values.yaml

# The app name
name: "some-app"

image1:
  # The image1 tag
  tag: "1.2.0"
image2:
  # The image2 tag
  tag: "1.2.0"
> yq -y '.image2.tag = "1.3.0"' values.yaml | tee out2.yaml
name: some-app
image1:
  tag: "1.2.0"
image2:
  tag: "1.3.0"
> patch values.yaml -i out.patch

out.patch

5c5
<   tag: "1.2.0"
---
>   tag: "1.3.0"

values.yaml

# The app name
name: "some-app"

image1:
  # The image1 tag
  tag: "1.3.0"
image2:
  # The image2 tag
  tag: "1.2.0"

This additionally shows why the non uniform diff even without default options is less stable for patching.

DavidAttar commented 1 year ago

There will be any fixes to this issue in the future?

anthonyalayo commented 1 year ago

It sounds like there's no workaround?

chrisgrieser commented 1 year ago

prettier is the only yaml formatter I have tried that preserves blank lines correctly

Considering I switched to rome, it feels bit annoying though to have prettier installed just for it's ability to format yaml files :/

alexklibisz commented 1 year ago

It sounds like there's no workaround?

There are several workarounds mentioned throughout the thread. Look for 👍

bewuethr commented 1 year ago

Micro-improvement to the workaround that leaves blank lines alone: I have some YAML files with comments preceded by two blanks, like the SemVer comments left by dependabot when you reference an action by its full commit hash, like

uses: rymndhng/release-on-push-action@aebba2bbce07a9474bf95e8710e5ee8a9e922fe2  # v0.25.0

These blanks also get squashed to just one when you use yq to modify something else.

To prevent, diff has an option -w to ignore all whitespace, resulting in

yq "$1" "$2" | diff -Bw "$2" - | patch "$2" -
alita1991 commented 1 year ago

Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.

11,12c10
<   tag: ""
< 
---
>   tag: "1.0.0"
bewuethr commented 1 year ago

Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.

11,12c10
<   tag: ""
< 
---
>   tag: "1.0.0"

That's right, a blank line after a modified line gets removed! I haven't found a better workaround other than moving lines to modify away from a blank line, I'm afraid.

fulldecent commented 1 year ago

There is an alternate underlying yaml library that claims to encode whitespace. This is a competitor to the library currently used in yq.

https://github.com/pantoniou/libfyaml

fulldecent commented 11 months ago

This is dumb, but I'm just going to say it. If are are only using whitespace to separate sections and your sections each start with a comment like # some comment, then you can insert the whitespace back in with:

awk '/^---$/{flag=!flag; print; next} flag && /^#/{print ""} {print}'
rick4096 commented 3 months ago

Thanks for the idea with diff & patch @arcesino .

I my case the removal of blanks introduced by diff were unfortunately unacceptable, so i had to dig further.

And found a solution.

Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.

Here an example:

Starting point is my original yaml where the value of key "secrets.TEST" should be updated

---
config:

  # mysql
  DATABASE_PROTOCOL: "mysql"
  # instance fqdn
  DATABASE_HOST: "mysql"

secrets:
  # db password
  DATABASE_PASSWORD: "password"

  # example
  TEST: "foo"

# other values
#[...]

Step 1: updating the value & creating a copy

yq '.secrets.TEST = "NewValue"' sample.yaml > sample.yaml.new

Step 2: removing blanks from the original

yq '.' sample.yaml > sample.yaml.noblanks

Step 3: creating a patch

diff -B sample.yaml.noblanks sample.yaml.new > patch.file

the patch contains then only the value diffs:

$> cat patch.file
11c11
<   TEST: "foo"
---
>   TEST: "NewValue"

Step 4: apply the patch to the original

patch sample.yaml patch.file

Here a screenshot:

image

Utils used:

  • yq 4.20.2
  • patch 2.7.6
  • diff 3.7

OS: debian 11

This approach does not work if the diff ONLY appends new lines, e.g., if doing something like:

    yq "
      .my.prop.array += {\"Id\": \"$ID\", \"Spec\": \"$SPEC\"}
    " ./helm/values.yaml > $UPDATED_YAML

...the reason is because the line numbers will be completely wrong since the diff was computed using two files without the original blank lines. It will apply but to the wrong location in the original, which is very wrong. You could try to fix this approach by getting patch to ignore blank lines in the context of the hunk, but that apparently is not possible, the --ignore-whitespace option does not tell patch to ignore blank lines in the context, but in the changed lines themselves. The --binary option also does not work, only helping with CRLF on Windows.

Given this limitation, @arcesino 's approach was the only one that actually worked correctly.

Zigler commented 3 months ago

Just as a suggestion, one could pre-process insert a tag in to maintain the whitespace or empty lines into a separate place that uses a tag-parser to identify the ordering and where those tag occurances happen in the hierarchy. Then as a post processing step re-insert the whitespace in the right location after processing the original file. This may work better than a diff since it would identify it before the underlying output is created and may handle the cases where the previous diff method may fall short.

YaoJusheng commented 2 months ago

Hi, I have tried all the suggested methods, but nothing seems to work correctly.

Blank lines, comments, anchor and alias references in yaml files are all destroyed. Any suggestions for other possible solutions?

Will this support be available in the future? I saw go-yaml#627 closed.

andry81 commented 2 months ago

Hi, I have tried all the suggested methods, but nothing seems to work correctly.

Can you give an example what does not work with this method: https://github.com/mikefarah/yq/issues/515#issuecomment-1207700251

YaoJusheng commented 2 months ago

Hi, I have tried all the suggested methods, but nothing seems to work correctly.

Can you give an example what does not work with this method: #515 (comment)

Sorry, that should be my problem, The yq version I used is 3.x.

After updating to the latest version 4.x, the options of yq itself can already retain the format, except that the blank lines are deleted, which needs to be combined with diff and patch.