kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.53k stars 81 forks source link

A scalar starting with 00 (double zero) is interpreted as octal number #152

Open rafalkrupinski opened 1 year ago

rafalkrupinski commented 1 year ago

When parsing an unquoted value that starts with 00 (zero zero), yq tries to parse it a s octal number. According to the specification, octals are starting with 0o (zero oscar) and I was expecting the value to parse as string.

file.yaml:

009
yq . data1.yml
yq: Error running jq: ValueError: invalid literal for int() with base 8: '009'.

Thank you!

conao3 commented 9 months ago

Not only 00, starting 0 number is interpreted as octal number.

$ echo 'hour: 08' | yq .
yq: Error running jq: ValueError: invalid literal for int() with base 8: '08'.

pyyaml parses 08 as '08'.

$ echo 'hour: 08' | python -c 'import yaml; print(yaml.safe_load(input()))'
{'hour': '08'}
c00kiemon5ter commented 8 months ago

you can always set the type explicitly

$ cat file.yml
!str 009

$ yq . file.yml
"009"
kislyuk commented 3 months ago

I don't think yq will be doing anything to address this issue directly. Under the hood, yq uses PyYAML with developmental YAML 1.2 grammar regular expressions to match numeric literals. We will not be rolling our own regular expressions to address these edge cases, and we will not be backing away from 1.2 support and reverting to plain PyYAML with its 1.1 defaults, because that would cause even worse usability issues (like the "on" -> True problem).

The solution for this issue will come from using a fully YAML 1.2 compliant parser.

kislyuk commented 3 months ago

Note: if you're encountering this issue with YAML that is emitted by yq -y, a possible workaround is to use YAML 1.2 as the output grammar: yq -y --yaml-output-grammar-version=1.2. The output grammar is still set to 1.1 for compatibility with tools that expect 1.1-like behavior (although it will be changed to default to 1.2 in the future).

conao3 commented 3 months ago

hmm, "OK" but...

$ echo '{hours: ["05", "06", "07", "08", "09", "10"]}' | yq . -y | yq .
{
  "hours": [
    "05",
    "06",
    "07",
    "08",
    "09",
    "10"
  ]
}

$ brew upgrade python-yq
==> Upgrading 1 outdated package:
python-yq 3.3.0 -> 3.3.1
==> Downloading https://ghcr.io/v2/homebrew/core/python-yq/manifests/3.3.1
############################################################################################ 100.0%
==> Fetching python-yq
==> Downloading https://ghcr.io/v2/homebrew/core/python-yq/blobs/sha256:a3b2e22c6978bf8a606b45d378496ccf587c617686e64457e997fe7ff8797be5
############################################################################################ 100.0%
==> Upgrading python-yq
  3.3.0 -> 3.3.1 
==> Pouring python-yq--3.3.1.arm64_ventura.bottle.tar.gz
==> Caveats
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions
==> Summary
🍺  /opt/homebrew/Cellar/python-yq/3.3.1: 105 files, 852KB

$ echo '{hours: ["05", "06", "07", "08", "09", "10"]}' | yq . -y | yq .
yq: Error running jq: ValueError: invalid literal for int() with base 8: '08'.

$ echo '{hours: ["05", "06", "07", "08", "09", "10"]}' | yq . -y --yaml-output-grammar-version=1.2 | yq .
{
  "hours": [
    "05",
    "06",
    "07",
    "08",
    "09",
    "10"
  ]
}
kislyuk commented 3 months ago

@conao3 yes, I am aware of this issue. The workaround is to use yq -y --yaml-output-grammar-version=1.2.

kislyuk commented 3 months ago

@conao3 I committed a fix for the issue where yq emits unquoted string scalars that start with 08 and 09, and released it in v3.4.0. This removes the need to use --yaml-output-grammar-version=1.2; yq will now use what amounts to a modified version 1.1 for output with this quoting behavior as the main change.

To be clear, this doesn't address the issue with how these unquoted scalars are parsed in the input, and I don't expect to address this issue within yq.

conao3 commented 3 months ago

OK, thanks!