google / jsonnet

Jsonnet - The data templating language
http://jsonnet.org
Apache License 2.0
6.92k stars 438 forks source link

std.parseYaml fails on non-standard yaml feature, supported in other implementations #1109

Open CertainLach opened 1 year ago

CertainLach commented 1 year ago

Consider this input:

std.parseYaml("0777")

cpp-jsonnet outputs:

Something went wrong during jsonnet_evaluate_snippet, please report this: [json.exception.parse_error.101] parse error at line 1, column 5: syntax error while parsing array - unexpected number literal; expected ']'
Segmentation fault

go-jsonnet outputs:

511

jrsonnet outputs:

511

sjsonnet outputs:

sjsonnet.Error: Internal error: scala.MatchError: null
Caused by: scala.MatchError: null

But this is caused by sjsonnet only supporting objects in parseYaml, so std.parseYaml("a: 0777") works:

{
   "a": 511
}

Why does it happens? Well, there is two yaml specs: yaml 1.1, and yaml 1.2, and even if it is a minor update by semver... It isn't a semver. The one breaking change in yaml 1.2, is that octal literals in 0777 form are now forbidden, 0o777 should be used instead.

However, many yaml implementations are not following the spec, especially golang, which will even output octals in 0777 format: https://github.com/go-yaml/yaml/issues/420

For jrsonnet, I had to modify serde-yaml rust library to also accept this input: https://github.com/dtolnay/serde-yaml/pull/225

And rapidyaml did not implemented this compatibility feature: https://github.com/biojppm/rapidyaml/issues/291

johnbartholomew commented 7 months ago

I'm upgrading Rapid YAML in #1134, which fixes the segfault. However the behaviour still doesn't match go-jsonnet due to other implementation differences. Specifically, (C++) jsonnet relies on Rapid YAML to emit JSON (ie, to do a direct YAML to JSON conversion) and then parses the output of that just like it would parse any other JSON. But when Rapid YAML converts the input 0777 to JSON it does not emit it as a JSON number, but rather it emits a JSON string literal. This was done deliberately for https://github.com/biojppm/rapidyaml/issues/291

So the 'type' of the value is lost.

We would need to rewrite the std.parseYaml builtin to avoid marshalling everything through (Rapid YAML generated) JSON. Or contribute an upstream change to Rapid YAML to make it treat numbers like this differently. Or switch to a different YAML parser.