mikefarah / yq

yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
https://mikefarah.gitbook.io/yq/
MIT License
12.24k stars 600 forks source link

😲 Emoji characters in keys & values in v4 are lost/corrupted #814

Open elasticdotventures opened 3 years ago

elasticdotventures commented 3 years ago
$ yq -V
yq version 4.7.0

The way yq v4 handles emoji is odd, inconsistent, unpredictable (which did not occur on earlier yq 2x versions which had other limitations)

yq should (imho) pass utf8/emoji through unmolested. yq works properly with pinyin (chinese mandarin) fonts but ideograms are so much more powerful and universal it'd be nice to use them with.

For example let's say emojifile.yaml with contents:

---
"bash.🔨/init.10级.🥾.b00t.sh": ""
"bash.🔨/init.20级.🐧.linux.sh": ""
"bash.🔨/init.22级.🐙.git.sh": ""
"bash.🔨/init.30级.🐳.层.docker.sh": ""
"bash.🔨/init.32级.💠.层.hashicorp.sh": ""
"bash.🔨/init.40级.🐍.语.python.sh": ""
"bash.🔨/init.40级.🚀.语.node.sh": ""
"bash.🔨/init.42级.🦄.语.typescript.sh": ""
"bash.🔨/init.43级.🥷.语.vue.sh": ""
"bash.🔨/init.44级.☕.语.java.sh": ""
"bash.🔨/init.44级.🏇.语.go.sh": ""
"bash.🔨/init.50级.👾.云☁️.gcp.sh": ""
"bash.🔨/init.50级.🤖.云☁️.azure.sh": ""
"bash.🔨/init.50级.🦉.云☁️.aws.sh": ""
"bash.🔨/init.60级.🎙️💙.应用.vscode.sh": ""
"bash.🔨/init.70级.☎️.msg.sh": ""
"bash.🔨/init.70级.🎬.video.sh": ""
"bash.🔨/init.70级.📱.mobile.sh": ""
"bash.🔨/init.70级.🕹️.gamesim.sh": ""
"bash.🔨/init.70级.🤑.ecommerce.sh": ""
"bash.🔨/init.70级.🥯.crypto.sh": ""
"bash.🔨/init.70级.🧠.ai.sh": ""
"bash.🔨/init.80级.🐱‍💻.esp32.sh": ""

then

$ cat emojifile.yaml | yq eval

will produce (on my ubuntu system)

"bash.\/init.10级.\.b00t.sh": ""
"bash.\/init.20级.\.linux.sh": ""
"bash.\/init.22级.\.git.sh": ""
"bash.\/init.30级.\.层.docker.sh": ""
"bash.\/init.32级.\.层.hashicorp.sh": ""
"bash.\/init.40级.\.语.python.sh": ""
"bash.\/init.40级.\.语.node.sh": ""
"bash.\/init.42级.\.语.typescript.sh": ""
"bash.\/init.43级.\.语.vue.sh": ""
"bash.\/init.44级.☕.语.java.sh": ""
"bash.\/init.44级.\.语.go.sh": ""
"bash.\/init.50级.\.云☁️.gcp.sh": ""
"bash.\/init.50级.\.云☁️.azure.sh": ""
"bash.\/init.50级.\.云☁️.aws.sh": ""
"bash.\/init.60级.\️\.应用.vscode.sh": ""
"bash.\/init.70级.☎️.msg.sh": ""
"bash.\/init.70级.\.video.sh": ""
"bash.\/init.70级.\.mobile.sh": ""
"bash.\/init.70级.\️.gamesim.sh": ""
"bash.\/init.70级.\.ecommerce.sh": ""
"bash.\/init.70级.\.crypto.sh": ""
"bash.\/init.70级.\.ai.sh": ""
"bash.\/init.80级.\‍\.esp32.sh": ""

This is for b00t framework.

elasticdotventures commented 3 years ago

cat emojifile.yaml | yq eval -M

"bash.\U0001F528/init.10级.\U0001F97E.b00t.sh": ""
"bash.\U0001F528/init.20级.\U0001F427.linux.sh": ""
"bash.\U0001F528/init.22级.\U0001F419.git.sh": ""
"bash.\U0001F528/init.30级.\U0001F433.层.docker.sh": ""
"bash.\U0001F528/init.32级.\U0001F4A0.层.hashicorp.sh": ""
"bash.\U0001F528/init.40级.\U0001F40D.语.python.sh": ""
"bash.\U0001F528/init.40级.\U0001F680.语.node.sh": ""
"bash.\U0001F528/init.42级.\U0001F984.语.typescript.sh": ""
"bash.\U0001F528/init.43级.\U0001F977.语.vue.sh": ""
"bash.\U0001F528/init.44级.☕.语.java.sh": ""
"bash.\U0001F528/init.44级.\U0001F3C7.语.go.sh": ""
"bash.\U0001F528/init.50级.\U0001F47E.云☁️.gcp.sh": ""
"bash.\U0001F528/init.50级.\U0001F916.云☁️.azure.sh": ""
"bash.\U0001F528/init.50级.\U0001F989.云☁️.aws.sh": ""
"bash.\U0001F528/init.60级.\U0001F399️\U0001F499.应用.vscode.sh": ""
"bash.\U0001F528/init.70级.☎️.msg.sh": ""
"bash.\U0001F528/init.70级.\U0001F3AC.video.sh": ""
"bash.\U0001F528/init.70级.\U0001F4F1.mobile.sh": ""
"bash.\U0001F528/init.70级.\U0001F579️.gamesim.sh": ""
"bash.\U0001F528/init.70级.\U0001F911.ecommerce.sh": ""
"bash.\U0001F528/init.70级.\U0001F96F.crypto.sh": ""
"bash.\U0001F528/init.70级.\U0001F9E0.ai.sh": ""
"bash.\U0001F528/init.80级.\U0001F431‍\U0001F4BB.esp32.sh": ""
elasticdotventures commented 3 years ago

BUT -j (json) apparently works

$ cat emojifile.yaml | yq eval -j
{
  "bash.🔨/init.10级.🥾.b00t.sh": "",
  "bash.🔨/init.20级.🐧.linux.sh": "",
  "bash.🔨/init.22级.🐙.git.sh": "",
  "bash.🔨/init.30级.🐳.层.docker.sh": "",
  "bash.🔨/init.32级.💠.层.hashicorp.sh": "",
  "bash.🔨/init.40级.🐍.语.python.sh": "",
  "bash.🔨/init.40级.🚀.语.node.sh": "",
  "bash.🔨/init.42级.🦄.语.typescript.sh": "",
  "bash.🔨/init.43级.🥷.语.vue.sh": "",
  "bash.🔨/init.44级.☕.语.java.sh": "",
  "bash.🔨/init.44级.🏇.语.go.sh": "",
  "bash.🔨/init.50级.👾.云☁️.gcp.sh": "",
  "bash.🔨/init.50级.🤖.云☁️.azure.sh": "",
  "bash.🔨/init.50级.🦉.云☁️.aws.sh": "",
  "bash.🔨/init.60级.🎙️💙.应用.vscode.sh": "",
  "bash.🔨/init.70级.☎️.msg.sh": "",
  "bash.🔨/init.70级.🎬.video.sh": "",
  "bash.🔨/init.70级.📱.mobile.sh": "",
  "bash.🔨/init.70级.🕹️.gamesim.sh": "",
  "bash.🔨/init.70级.🤑.ecommerce.sh": "",
  "bash.🔨/init.70级.🥯.crypto.sh": "",
  "bash.🔨/init.70级.🧠.ai.sh": "",
  "bash.🔨/init.80级.🐱‍💻.esp32.sh": ""
}
elasticdotventures commented 3 years ago

Just confirmed same behavior on yq 4.8.0

elasticdotventures commented 3 years ago

Just confirmed that the "other" yq project works properly with Emoji. https://github.com/kislyuk/yq

When I said "earlier" versions worked, that was incorrect.
I didn't realize I'd switched repos.

mikefarah commented 3 years ago

Digging a little into this - and as far as I can tell it's an issue with go-yaml, the underlying yaml parser :(

https://github.com/go-yaml/yaml/issues/279

Not sure if I'll be able to work around it

mikefarah commented 3 years ago

Raised a new issue here: https://github.com/go-yaml/yaml/issues/737

mikefarah commented 3 years ago

Note that '-j' works because the issue is with the yaml Encoder and the json encoder works fine.

zhangguanzhang commented 2 years ago

if use shell, could used this command

tr -cd '\11\12\15\40-\176' < 1.yml  > new.yml
surrim commented 2 months ago
$ cat emojifile.yaml | yq eval

will produce (on my ubuntu system)

...

I found a small workaround. You can wrap it with echo -e, then it will work:

echo -e "$(cat emojifile.yaml | yq eval)"