`yamldecode` converts certain field names to bool

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io/

Other

42.36k stars 9.49k forks source link

`yamldecode` converts certain field names to bool #34611

Open deepbrook opened 7 months ago

deepbrook commented 7 months ago

Terraform Version

1.7.2

Terraform Configuration Files

Files not needed - running in terraform console instead:

> yamldecode("{on: 1}")
{
  "true" = 1
}
>

Debug Output

N/A

Expected Behavior


> yamldecode("{on: 1}")
{
  "on" = 1  // <-- field name should remain unchanged
}

### Actual Behavior

It appears that in addition to converting *values* to bools, field names are also affected. This results in unexpected behaviour when, for example, parsing a github actions workflow file (which uses `on` to declare workflow exectuion conditions/options). 

### Steps to Reproduce

1. `terraform console`
2.  `yamldecode("{on: 1}")`

### Additional Context

_No response_

### References

_No response_

liamcervante commented 7 months ago

Hi @deepbrook, thanks for filing this.

This is an issue with our upstream type library, tracked here: https://github.com/zclconf/go-cty-yaml/issues/11.

It seems like this behaviour is expected in YAML 1.1: https://yaml.org/type/bool.html. You can see other YAML libraries have the same kind of thing filed, for example in PyYAML: https://github.com/yaml/pyyaml/issues/613.

But, the docs do say we support YAML 1.2 so I do think this an incorrect modification being made by the library as long as we are claiming YAML 1.2. Unfortunately we are constrained by the upstream library here, so will have to see how practical a fix in that library is.

bflad commented 7 months ago

@liamcervante I'm not sure if the compatibility promises outlined in https://github.com/hashicorp/terraform/issues/34301#issuecomment-1828266948 are also applicable in this case as well. Hopefully with provider-defined functions, someone can create YAML 1.2 specific functions without these little quirks.

apparentlymart commented 7 months ago

In practice yamldecode supports a similar pragmatic blend of YAML v1.2 and allowances for older versions, because in practice existing YAML in the world tends not to be strictly written for one version, possibly in part because the differences between different YAML versions are widely misunderstood.

I think the only change we can possibly make here is to update the documentation to be clearer about what YAML versions this function actually supports. I think when I wrote this doc I intended to mean that it supports "v1.2 and earlier", which was clumsy on my part because most other formats we've supported haven't made such significant breaking changes over time.

We cannot change the current behavior of yamldecode because, as @bflad suggested, it's protected by the Terraform v1.x compatibility promises. He's also right that someone could potentially make a "YAML v1.2" provider that offers a stricter yamldecode, if desired, once that capability is supported in the forthcoming Terraform v1.8 series.

However, if you want to keep using the built-in yamldecode then it's best to write ambiguous strings in quotes -- either double or single, since YAML supports both -- to remove the ambiguity:

> yamldecode("{'on': 1}")
{
  "on" = 1
}

deepbrook commented 7 months ago

Just for the record, i do not believe this has anything to do with the 1.1 spec, but rather with the linked bug in the city lib.

The spec, afaiu, only states that values are converted to booleans if they match the pattern. Not the keys.

apparentlymart commented 7 months ago

As far as I understand the YAML 1.1 spec, it's built around three fundamental structural primitives: scalars, sequences, and mappings, with two different syntaxes defined for each (flow style and block style). Collectively, these three concepts are described as nodes. The sequence and mapping node types are aggregates of other nodes.

The rule about sequences like yes, no, etc being interpreted as booleans is a semantic rule that the spec describes as being applied to any scalars, in Scalar Formats.

I can't find any language in the spec that suggests that "mapping key" is a distinct concept from "node" and describes different treatment of them, but the spec is very dense and not easy to read only in portions, and it's been a long time since I read the whole thing from top to bottom, so I'm happy to be corrected if you can cite a part of the spec that explicitly says that scalar nodes in a mapping-key position are supposed to get different parsing treatment than scalar nodes in other contexts.

(There are some special rules about "simple keys" in Block Mappings, but the exceptions there seem to describe only differences in treatment of delimiters adjacent to the scalar and what characters are allowed to appear in the scalar's representation; I don't see any requirement that the scalars in that position should be interpreted into typed values differently once parsing is complete.)

asaba-hashi commented 7 months ago

Here is a workaround I am using, which only works if a) only YAML 1.1 is required and b) other you are interoperating with more completely support the spec. In my case, this works fine when reading and writing the yaml files using Python's ruamel.yamls rt (round trip) encoder:

locals {
   doc = <<EOT
%YAML 1.1
---
root:
  notaboolvalue: 'no'
  'no': no
EOT
   val1 = yamldecode(trimprefix(local.doc, "%YAML 1.1\n"))["root"]["no"]
   val2 = yamldecode(trimprefix(local.doc, "%YAML 1.1\n"))["root"]["notaboolvalue"]
}

output "notaboolkey" {
   value = local.val1
}

output "notaboolvalue" {
   value = local.val2
}

burnerlee commented 2 months ago

hey @liamcervante can I take this issue up if it would be suitable for me as a beginner to this project?

apparentlymart commented 2 months ago

I suggested in an earlier comment that once Terraform v1.8 is released with support for provider-defined functions it would be possible for someone to write a provider that contains a stricter yamldecode that, for example, supports only YAML v1.2 and eschews all of the oddities of prior versions that were removed in v1.2.

Terraform v1.8 was released in the meantime, and so that is now possible. The emerging idiom for provider-defined functions suggests that the provider be named yaml and the function be named decode so that it would be called as provider::yaml::decode rather than the arguably-redundant provider::yaml::yamldecode or provider::yaml::decode_yaml, although of course anyone who writes such a provider can choose whichever naming scheme they like.

The current behavior of yamldecode is protected by the Terraform v1.x compatibility promises and so I don't think we can change it in a way that would cause a different interpretation of input that was previously accepted as valid. Therefore I think the only thing we can do to improve the situation described here is to update the documentation for yamldecode to clarify that it also supports some features from earlier versions of YAML, to fix the current implication that it supports YAML 1.2 features only. Therefore I'm going to relabel this as a documentation bug.