Open scanfield opened 4 years ago
Same story, sorry but this issue looks more like a bug nether then enhancement. When you process yaml file with yq it corrupts a whole file
Any update on this, this is a really nice feature.
Any update on this ? It's getting difficult when it comes to readability
This is an effect of the underlying yaml parser, an issue was raised there https://github.com/go-yaml/yaml/issues/627 - the owner said
..the content when re-encoded will not
have its original textual representation preserved. An effort is made to
render the data plesantly, and to preserve comments near the data they
describe, though.
I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using diff
& patch
commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:
doc:
version: 1.0.0
name: numbers & letters
numbers:
- 1
letters:
- a
we call this file a.yaml
. Now let's update the version using yq
and store the result in new file a-updated.yaml
:
yq e '.doc.version = "1.0.1"' a.yaml > a-updated.yaml
as expected, command above stripped all blank lines so a-updated.yaml
looks like:
doc:
version: 1.0.1
name: numbers & letters
numbers:
- 1
letters:
- a
at this point, the first step to get the blank lines back is to create a diff file that ignores blank lines changes:
diff -U0 -w -b --ignore-blank-lines a.yaml a-updated.yaml > a.diff
a.diff
looks like this:
--- a.yaml 2021-04-30 15:28:38.000000000 -0500
+++ a-updated.yaml 2021-04-30 15:18:53.000000000 -0500
@@ -2 +2 @@
- version: 1.0.0
+ version: 1.0.1
then final step is to patch original file with the diff:
patch a.yaml < a.diff
after that, the original file looks like:
doc:
version: 1.0.1
name: numbers & letters
numbers:
- 1
letters:
- a
the issue comes when the updated line is right before a blank line. For example, let's add an element to one of the arrays:
yq e '.numbers += 2' a.yaml > a-updated.yaml
the updated file is now:
doc:
version: 1.0.1
name: numbers & letters
numbers:
- 1
- 2
letters:
- a
if we generate the diff file as before we'll get the following:
--- a.yaml 2021-04-30 15:30:22.000000000 -0500
+++ a-updated.yaml 2021-04-30 15:35:26.000000000 -0500
@@ -7 +6 @@
-
+ - 2
and patching the original file with diff above results in:
doc:
version: 1.0.1
name: numbers & letters
numbers:
- 1
- 2
letters:
- a
notice how the blank line after the new element in numbers
array remains stripped while others are back. This is due since the diff file considers the blank line deletion and the addition of the new array element as part of the same diffset so it's not ignored by --ignore-blank-lines
.
This is not ideal in any means but in my case it has helped a lot since my files are big and with lots of blank lines. I'm sharing this in case someone else can find it useful too.
Thanks ! I use @arcesino approach for this 1 liner.
filename=xxx
version=xxx
patch "$filename" <<< $(diff -U0 -w -b --ignore-blank-lines $filename <(yq eval ".my.version = \"$version\"" $filename))
Thanks for the idea with diff & patch @arcesino .
I my case the removal of blanks introduced by diff
were unfortunately unacceptable, so i had to dig further.
And found a solution.
Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.
Here an example:
Starting point is my original yaml where the value of key "secrets.TEST" should be updated
---
config:
# mysql
DATABASE_PROTOCOL: "mysql"
# instance fqdn
DATABASE_HOST: "mysql"
secrets:
# db password
DATABASE_PASSWORD: "password"
# example
TEST: "foo"
# other values
#[...]
yq '.secrets.TEST = "NewValue"' sample.yaml > sample.yaml.new
yq '.' sample.yaml > sample.yaml.noblanks
diff -B sample.yaml.noblanks sample.yaml.new > patch.file
the patch contains then only the value diffs:
$> cat patch.file
11c11
< TEST: "foo"
---
> TEST: "NewValue"
patch sample.yaml patch.file
Here a screenshot:
Utils used:
OS: debian 11
good idea! I turned that in a fish and bash functions in this Gist:
#fish
function yqblank;
yq eval "$argv[1]" "$argv[2]" | diff -B "$argv[2]" - | patch "$argv[2]" -o -
end
#bash
yqblank() {
yq eval $1 $2 | diff -B $2 - | patch $2 -o -
}
this makes it possible to use yq without changing (most) of the blank lines. usage as follows:
yqblank '.' file_name.yml
@clementnuss I think patch $2 -o -
does not work and -o
should be removed there.
#bash
yqblank() {
yq eval $1 $2 | diff -B $2 - | patch $2 -
}
@clementnuss I think
patch $2 -o -
does not work and-o
should be removed there.
@raQai, thank you! Just that the arguments have to be quoted properly, also eval/e
can be omitted since yq 4.18.1:
#bash
yqblank() {
yq "$1" "$2" | diff -B "$2" - | patch "$2" -
}
Oh yeah, I forgot about the quote part :sweat_smile: Was on a hurry so thanks for adding this :+1:
edit: I would also like to add, that this still sometimes merges multi line descriptions and arrays into one and it is not able to properly handle comments.
source:
fruits: [
Apple,
Banana,
Calamansi,
]
becomes:
fruits: [Apple, Banana, Calamansi,]
source:
fruits: [
Apple, # comment 1
Banana, # comment 2
Calamansi, # comment 3
]
becomes:
fruits: [
Apple, # comment 1
Banana, # comment 2
Calamansi, # comment 3
]
(I did not verify this on my current machine but that was roughly the result)
edit2:
@arcesino we also ran into the same thing you did with the .info.version
update.
Long story short: We still use yq
but only to get the line of the .info.version
using the line
operator and update it using sed
.
Something along those lines should work
$ sed -i "$(yq '.info.version | line' "$file")s/$old_val/$new_val/" "$file"
This also returns the correct line if the value of .info.version
is broken to the next line
info:
version: 1.x.x # line 2
info:
version:
1.x.x # line 3
I'm hit by this too. No fix, only workarounds?
Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.
Unfortunately this only works for changes in already existed values. The patch would be with offsetted blank lines if try to add lines to the yaml.
I've already tested that and it does not work as expected for additions: https://github.com/andry81-devops/gh-workflow/blob/ee5d2d5b6bf59299e39baa16bb85357cf34a8561/bash/github/init-yq-workflow.sh https://github.com/andry81-devops/gh-workflow/blob/9b9d01a9b60a65d6c3c29f5b4b200409fc6a0aed/bash/cache/accum-content.sh
Search for: yq_edit
, yq_diff
, yq_patch
So, only the diff-versus-edited-yaml instead of diff-versus-unblanked-yaml looks reliable as @arcesino showed.
@arcesino
I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using
diff
&patch
commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:
This one has one disadvantage, it does remove comments. And there is no any way to completely correctly retain comments outside the yq utility, because the comments format depends on yaml syntax.
I've new implementation of bash scripts which is better of all above.
Implementation: https://github.com/andry81-devops/gh-workflow/blob/master/bash/github/init-yq-workflow.sh Example of usage: https://github.com/andry81-devops/gh-workflow/blob/master/bash/cache/accum-content.sh
# Usage example:
#
>yq_edit '<prefix-name>' 'edit' "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
<list-of-yq-eval-strings> && \
yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
# <input-yaml> - input yaml file path
# <output-yaml> - output yaml file path
#
# <output-yaml-edited> - output file name of edited yaml
# <output-diff-edited> - output file name of difference file generated from edited yaml
# <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
# <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file
Example with test.yml
:
# This file is automatically generated
#
content-index:
timestamp: 1970-01-01T00:00:00Z
entries:
- dirs:
- dir: dir-1/dir-2
files:
- file: file-1.dat
md5-hash:
timestamp: 1970-01-01T00:00:00Z
- file: file-2.dat
md5-hash:
timestamp:
- file: file-3.dat
md5-hash:
timestamp:
- dir: dir-1/dir-2/dir-3
files:
- file: file-1.dat
md5-hash:
timestamp:
- file: file-2.dat
md5-hash:
timestamp:
export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow
source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"
[[ -d "./temp" ]] || mkdir "./temp"
export TEMP_DIR="./temp"
yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?
PROs:
# ...
key: value # ...
CONs:
Related issues:
Here is another possible workaround. We basically pre-format the file once with no content changes. Then make the content change. Then compare the pre-formatted and the content-changed versions to get a patch. Then apply the patch to the original file. I've only tried it for simple cases like patching the version in a helm values file. It seems to work well, and also seems to preserve comments.
$ yq --version
yq version 4.9.8
$ # The original file
$ cat values.yaml
# The app name
name: "some-app"
image:
# The image tag
tag: "1.2.0"
# Some other comments...
# ...
$ # Don't change anything; just let yq do its default formatting
$ yq eval --exit-status '.' values.yaml | tee out1.yaml
# The app name
name: "some-app"
image:
# The image tag
tag: "1.2.0"
# Some other comments...
# ...
$ # Now make the actual change
$ yq eval --exit-status '.image.tag = "1.3.0"' values.yaml | tee out2.yaml
# The app name
name: "some-app"
image:
# The image tag
tag: "1.3.0"
# Some other comments...
# ...
$ # Diff the two stripped files to get a minimal diff with no special flags.
$ diff out1.yaml out2.yaml | tee out.patch
5c5
< tag: "1.2.0"
---
> tag: "1.3.0"
$ # Apply the patch to the original file, which was unchanged so far.
$ patch values.yaml < out.patch
patching file values.yaml
$ # Inspect the final file.
$ # Note the version was changed and everything else remained the same.
$ cat values.yaml
# The app name
name: "some-app"
image:
# The image tag
tag: "1.3.0"
# Some other comments...
# ...
Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.
It has the same issues with comments and blanks remove.
Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.
It has the same issues with comments remove.
I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.
I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.
The diff shows position in already edited file:
3c3
means change in 3d line, when actually has changed 6th line:
1: # The app name
2: name: "some-app"
3:
4: image:
5: # The image tag
6: tag: "1.2.0"
Better to use uniform diff to see:
> diff -u out1.yaml out2.yaml | tee out-uniform.patch
--- out1.yaml
+++ out2.yaml
@@ -1,3 +1,3 @@
name: some-app
image:
- tag: "1.2.0"
+ tag: "1.3.0"
To exploit:
values.yaml
# The app name
name: "some-app"
image1:
# The image1 tag
tag: "1.2.0"
image2:
# The image2 tag
tag: "1.2.0"
> yq -y '.image2.tag = "1.3.0"' values.yaml | tee out2.yaml
name: some-app
image1:
tag: "1.2.0"
image2:
tag: "1.3.0"
> patch values.yaml -i out.patch
out.patch
5c5
< tag: "1.2.0"
---
> tag: "1.3.0"
values.yaml
# The app name
name: "some-app"
image1:
# The image1 tag
tag: "1.3.0"
image2:
# The image2 tag
tag: "1.2.0"
This additionally shows why the non uniform diff even without default options is less stable for patching.
There will be any fixes to this issue in the future?
It sounds like there's no workaround?
prettier is the only yaml formatter I have tried that preserves blank lines correctly
Considering I switched to rome
, it feels bit annoying though to have prettier installed just for it's ability to format yaml files :/
It sounds like there's no workaround?
There are several workarounds mentioned throughout the thread. Look for 👍
Micro-improvement to the workaround that leaves blank lines alone: I have some YAML files with comments preceded by two blanks, like the SemVer comments left by dependabot when you reference an action by its full commit hash, like
uses: rymndhng/release-on-push-action@aebba2bbce07a9474bf95e8710e5ee8a9e922fe2 # v0.25.0
These blanks also get squashed to just one when you use yq to modify something else.
To prevent, diff has an option -w
to ignore all whitespace, resulting in
yq "$1" "$2" | diff -Bw "$2" - | patch "$2" -
Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.
11,12c10
< tag: ""
<
---
> tag: "1.0.0"
Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.
11,12c10 < tag: "" < --- > tag: "1.0.0"
That's right, a blank line after a modified line gets removed! I haven't found a better workaround other than moving lines to modify away from a blank line, I'm afraid.
There is an alternate underlying yaml library that claims to encode whitespace. This is a competitor to the library currently used in yq.
This is dumb, but I'm just going to say it. If are are only using whitespace to separate sections and your sections each start with a comment like # some comment
, then you can insert the whitespace back in with:
awk '/^---$/{flag=!flag; print; next} flag && /^#/{print ""} {print}'
Thanks for the idea with diff & patch @arcesino .
I my case the removal of blanks introduced by
diff
were unfortunately unacceptable, so i had to dig further.And found a solution.
Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.
Here an example:
Starting point is my original yaml where the value of key "secrets.TEST" should be updated
--- config: # mysql DATABASE_PROTOCOL: "mysql" # instance fqdn DATABASE_HOST: "mysql" secrets: # db password DATABASE_PASSWORD: "password" # example TEST: "foo" # other values #[...]
Step 1: updating the value & creating a copy
yq '.secrets.TEST = "NewValue"' sample.yaml > sample.yaml.new
Step 2: removing blanks from the original
yq '.' sample.yaml > sample.yaml.noblanks
Step 3: creating a patch
diff -B sample.yaml.noblanks sample.yaml.new > patch.file
the patch contains then only the value diffs:
$> cat patch.file 11c11 < TEST: "foo" --- > TEST: "NewValue"
Step 4: apply the patch to the original
patch sample.yaml patch.file
Here a screenshot:
Utils used:
- yq 4.20.2
- patch 2.7.6
- diff 3.7
OS: debian 11
This approach does not work if the diff ONLY appends new lines, e.g., if doing something like:
yq "
.my.prop.array += {\"Id\": \"$ID\", \"Spec\": \"$SPEC\"}
" ./helm/values.yaml > $UPDATED_YAML
...the reason is because the line numbers will be completely wrong since the diff was computed using two files without the original blank lines. It will apply but to the wrong location in the original, which is very wrong. You could try to fix this approach by getting patch
to ignore blank lines in the context of the hunk, but that apparently is not possible, the --ignore-whitespace
option does not tell patch to ignore blank lines in the context, but in the changed lines themselves. The --binary
option also does not work, only helping with CRLF on Windows.
Given this limitation, @arcesino 's approach was the only one that actually worked correctly.
Just as a suggestion, one could pre-process insert a tag in to maintain the whitespace or empty lines into a separate place that uses a tag-parser to identify the ordering and where those tag occurances happen in the hierarchy. Then as a post processing step re-insert the whitespace in the right location after processing the original file. This may work better than a diff since it would identify it before the underlying output is created and may handle the cases where the previous diff method may fall short.
Hi, I have tried all the suggested methods, but nothing seems to work correctly.
Blank lines, comments, anchor and alias references in yaml files are all destroyed. Any suggestions for other possible solutions?
Will this support be available in the future? I saw go-yaml#627 closed.
Hi, I have tried all the suggested methods, but nothing seems to work correctly.
Can you give an example what does not work with this method: https://github.com/mikefarah/yq/issues/515#issuecomment-1207700251
Hi, I have tried all the suggested methods, but nothing seems to work correctly.
Can you give an example what does not work with this method: #515 (comment)
Sorry, that should be my problem, The yq version I used is 3.x.
After updating to the latest version 4.x, the options of yq itself can already retain the format, except that the blank lines are deleted, which needs to be combined with diff and patch.
Is your feature request related to a problem? Please describe.
when run through
yq w - foo.baz 3
produces
Describe the solution you'd like Keep my extra blank line (it's better for readability / produces less of a diff)