Object keys are silently filtered from variable if not in type definition

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io

Other

42.76k stars 9.56k forks source link

Object keys are silently filtered from variable if not in type definition #29204

Open sblask opened 3 years ago

sblask commented 3 years ago

$ terraform --version
Terraform v1.0.2
on darwin_amd64

main.tf:

module "module" {
  source = "./module"

  users = [
    {
      firstname = "foo"
    },
    {
      firstname = "foo"
      lastname  = "bar"
    },
  ]
}

output "users" {
  value = module.module.users
}

./module/main.tf:

variable "users" {
  type = list(object({
    firstname = string
  }))
}

output "users" {
  value = var.users
}

Debug Output

https://gist.github.com/sblask/5c1140a3e00e9fa6555cc92a924cdd32

Actual vs Expected Behavior

The output looks like this:

Changes to Outputs:
  + users = [
      + {
          + firstname = "foo"
        },
      + {
          + firstname = "foo"
        },
    ]

If this is how it is intended, there should be a big warning or error that something got passed into the module that was not as defined in the variable definition and got filtered out, or it should look like this:

Changes to Outputs:
  + users = [
      + {
          + firstname = "foo"
        },
      + {
          + firstname = "foo"
          + lastname = "bar"
        },
    ]

which is what I had expected.

Steps to Reproduce

terraform init
terraform apply

jbardin commented 3 years ago

Hi @sblask,

What you have described here is the intended design of the type system within the Terraform language. Implicit type conversion is always done on assignment within the language, and an object can always be converted to another object with a matching subset of attributes.

A module should be declaring the input variable type according to how it is used. If the module declares that the object only has a firstname attribute, and only ever accesses a firstname attribute, then the removal of any extra attributes is inconsequential. A primary motivation for this is to allow modules to declare exactly what they need from an object, which can be satisfied by values (usually resource instances themselves) from other modules. This way if the schema from another module value is extended (e.g. a new attribute is added to a resource), it won't break the usage of that value with the receiving module. This allows modules to be more easily composed, preventing tight coupling of their development cycles.

You can see examples of this in the documentation, like Conditional Creation of Objects.

sblask commented 3 years ago

This is a weird piece of documentation you have linked there, instead of having an example of what it's trying to avoid and a simple solution, it gives a solution with an object where I could simply pass in the AMI ID. I am not even sure that there is a practical example for the "problem".

When you write "removal of any extra attributes is inconsequential" you are assuming that I actually use the attributes directly, I don't. My use case is that I have a module that creates a launch template for an instance which in one environment has a set of users A and in another a set of users B and I pass in the list of users that directly ends up in the user data as a cloud init config. We defined the users variable as a list of objects to get some type checking because cloud-init is quite sensitive. For example, if the user id is an int instead of a string, it's not used. Now in one environment we'd like ssh access, in the other we don't so we thought we could have ssh_authorized_keys in one user list and not have it in the other. It took us a long while to figure out why ssh_authorized_keys would not appear in our config. We can't use optional because that gives value null if not given, which breaks cloud init.

Anyway, I don't think this is about documentation (even though I also could not find anything that there is object conversion/filtering going on and I am not sure why a receiving module would break in any way if an object that is being passed in has more attributes than defined in the variable). The problem is that the conversion is implicit and I don't get any warning about this.

apparentlymart commented 3 years ago

Hi @sblask,

I think the more direct documentation about the situation you're describing is in Type Constraints, which directly discusses the meaning of Terraform's type constraints and the type conversion behaviors, whereas the Module Composition page is more of a guide on how to address a few specific use-cases.

I understand that you were surprised to learn that the Terraform language has implicit type conversion behaviors, whereas other languages have different designs where either the language would outright reject additional attributes or would just pass through the original value.

The behavior you observed here is rooted in the same behavior whereby Terraform would, for example, implicitly convert a string containing decimal digits into a number in order to satisfy a type = number type constraint. These implicit conversions are a fundamental part of the Terraform language design and cannot be changed this late in the life of the Terraform language, although I do certainly understand that some would find them unfamiliar or surprising because there are lots of different approaches to type safety and implicit vs. explicit conversions in other languages.

With that said, if you're looking for a solution to the problem you stated within the current design of the Terraform language then the answer comes in the form of the any placeholder in type constraints. Whereas an object type constraint specifies an exact object type to convert to, any instead serves as a free type parameter and asks Terraform to select a specific type to replace it dynamically based on the value provided by the caller. Of course, the consequence of this is that Terraform itself will no longer apply any constraint or conversion itself, and so you'd need to switch to specifying your expectations as validation rules rather than as type conversion. For example:

variable "users" {
  type = any

  validation {
    condition = alltrue([for u in var.users : can(u.firstname)])
    error_message = "All users must have a "firstname" attribute."
  }
}

When passing in the value you showed in your example, Terraform will dynamically replace the any placeholder with the concrete type of the value you passed in:

tuple([
  object({
    firstname = string
  }),
  object({
    firstname = string
    lastname  = string
  }),
])

...and thus this will allow your value to pass in without any implicit conversions whatsoever, and then the validation rule will raise an error if any of the tuple elements lack a firstname.

I think embedded in your overall feedback here is some specific feedback about the optional attribute experiment, of how having unset attributes appear as null makes it hard to use the result with functions like jsonencode and yamlencode for target systems that expect unset attributes to be totally absent, rather than explicitly set to null. Terraform's structural type system distinguishes between an attribute being present at all and it being null, so this is an example of the typical sorts of quirks that rise when converting between different type systems; today this involves post-processing the Terraform value to make it be shaped more like the external system expects, though certainly we could do with some extra features to make that post-processing more concise.

That is some feedback we've heard in similar forms before, and one of the use-cases we have noted down for a next iteration of that experiment in a future release. I don't know how we'll address that one yet, but it's definitely on the list to consider. I expect that the next iteration of that experiment will be quite different than the current one, accepting some more significant changes to the syntax for defining variables in exchange for some additional capabilities.

If your feedback here boils down to that current limitation of the optional attributes experiment then perhaps we can close this out and then revisit once there's a new iteration of that experiment to consider. In the meantime, using any along with explicit validation as I showed above is likely the best way to represent the behavior you're intending.

sblask commented 3 years ago

I wasn't after information of how to do things, but as you gave some:

  validation {
    condition = alltrue([for u in var.users : can(u.firstname)])
    error_message = "All users must have a "firstname" attribute."
  }

wouldn't really help me because there is no type check?

I now found the information about removal of attributes, thanks for the link to the documentation. I had read it, but never saw that bit of information. If you absolutely need to keep that behaviour, there could be a better example than mentioning aws_vpc but not having a link to it. But I don't see how it would be impossible to get rid of the removal of attributes. There is no harm in having attributes that you don't expect. And even you feel otherwise, there could be an (optional) warning that attributes did get discarded. I would never expect implicit conversion to happen and actively work with it. I can't imagine other people would because there is no precedence in other languages. Primitive types maybe because it happens in Javascript, but it's seen as a quirk and not a feature? Cloudformation too I think, but it's a pain to work with.

Even if optional worked by not adding null, I'd still expect a warning if something is removed from my input.

apparentlymart commented 3 years ago

I think at we've reached the point where we've agreed that Terraform is working as designed but that you disagree with this particular design decision. That suggests that we turn this into an enhancement request rather than a bug report, because that'll put it into our process for considering design changes to meet new use-cases.

It's interesting that we're discussing this in the context of the the use-case you raised here (of trying to pass arbitrary data to a foreign system using yamlencode/jsonencode) because that's also an example of why this behavior is important and subject to the compatibility promises:

variable "example" {
  type = object({
    name = string
    age  = number
  })
}

resource "example" "example" {
  person_json = jsonencode(var.example)
}

With the current design of the language, the contract is that if you write an exact type constraint for a variable (that is, one that doesn't include the any placeholder) then the value of the variable seen by the body of the module is guaranteed to exactly match that type unless Terraform detects and reports a type conversion error. The example above (which is contrived, but gets at the essense of passing through data structures to external systems with their own JSON/YAML-based formats) ensures that person_json will have JSON containing exactly those two attributes and no others, and that age will always be a number and type always a string, even if a caller passes in values that can implicitly convert to those types:

module "example" {
  source = "./modules/example"

  example = {
    name = "Bob"
    age  = "12"
  }
}

Terraform can implicitly convert "12" to 12, and so it will do so and thus allow this to work while still guaranteeing that the called module sees the types it was expecting, unless the module specifically opted in to manually dealing with types by using any.

We certainly can debate the relative merits of different tradeoffs of type system design, implicit vs. explicit conversions, and returning errors vs. "do what I (am assumed to) mean", but regardless of our specific preferences in this department there are existing modules relying on the current language design and so any change to it will need to be backward compatible such that no existing module sees different behavior without explicitly opting in to that additional behavior.

Given that, I'm going to reclassify this as a configuration language enhancement for the moment, so that we can consider your use-case when we're next considering new use-cases related to this part of the language. What exactly we'll be able to do about it remains to be seen, but that's just the typical challenge of evolving a language in a backward-compatible way.

Thanks again for raising this! We're currently focused in other areas so we won't be able to engage in any further design work in this area for the moment, but we'll return to it in future and consider the feedback you've raised and how it interacts with other existing use-case, and hopefully we can find a good compromise for a future language version.

sblask commented 3 years ago

Fair enough. As far as backwards compatibility goes, changing the behaviour could be achieved by using a different name than object with object keeping its current behaviour. But if you decide against it, a warning would be great. I just saw yesterday that one of my AMI builds that uses packer/ansible spits out warnings like this:

    amazon-ebs: [WARNING]: The value "0" (type int) was converted to "'0'" (type string). If
    amazon-ebs: changed: [default]
    amazon-ebs: this does not look like what you expect, quote the entire value to ensure it
    amazon-ebs: does not change.

Not exactly sure where that warning comes from, just saying that it's not unprecedented to have implicit conversion and get warnings about it. I guess more information about where exactly it comes from would be nice though...

EduardoRT commented 2 years ago

Coming to this thread after trying to figure out why some of my keys are missing from my object.

In my case, I'm using a tool to inject tags to my modules, my modules have a variable "tags" { type = object([...]) } constrain that defines all of our required tags, but this tags that are being deleted are not for the user to add, so I'm not adding them as required nor optional.

I'm failing to understand why there's an implicit type conversion from map to object when the variable type is of type object.

The object is more than capable of having keys added to it, the only reason the extra keys get dropped is that Terraform coerces the input from a map to an object instead of it being directly an object.

I understand the docs say that object -> map -> object is a lossy conversion, I just don't understand why this is done automatically with variables.

My use case for keeping the type definition in the object is that I'm using a tool that automatically generates documentation for the variables, having the type defined makes it easy for the tool to document the name of the expected keys, I could change the variable to map(string) and resolve the issue, I do have validation rules anyway, but I would lose the ability to auto-document the variables.

apparentlymart commented 2 years ago

When thinking about these things I think it's important to realize that "map" and "object" are not types, but are instead kinds of types. object({ foo = string }) is a type and object({ foo = string, bar = string }) is a type, and both of these are object types, but they are not the same object type. In this case, object({ foo = string, bar = string }) is a subtype of object({ foo = string }) and so automatic conversion is possible.

Similarly, map(string) is a type, and both of the object types I mentioned above are subtypes of map(string) because all of their attributes are defined as being of type string, so Terraform can automatically convert from map(string) to either of those types.

map(string) is not a supertype of object({ foo = list(string) }) because string is not auto-convertible to list(string), and so that particular case would fail with a type error.

Type constraints are intended as a way to make sure a variable value passed from outside will always conform to what the module author expects, and therefore the maintainer of a module doesn't need to explicitly handle all possible subtypes, particularly in situations like I showed above where the result gets JSON encoded and therefore the exact type is important. They can also potentially be useful for documentation, but only if what you intend to document is something the type system can represent. Terraform's type system doesn't currently have any explicit representation of "a map with at least these keys", which seems like what you wanted here, and so if you want to document something like that you'll have to do so using some means outside of the Terraform language, at least for today.

drewmullen commented 2 years ago

This way if the schema from another module value is extended (e.g. a new attribute is added to a resource), it won't break the usage of that value with the receiving module. This allows modules to be more easily composed, preventing tight coupling of their development cycles.

Perhaps another way to think about this... what if you introduced a new "strict type definition" toggle to error if a non-expected key is passed? If keys are not defined in the object a strict type toggle is set then produce an error on compile. That allows the "silent filter" described in the post and a forced convention option.

This also causes issues if you attempt to use validation blocks in modules.

The use-case I'm running into is this:

I have a deeply nested map where individual maps are passed to a sub-module via for_each loop. the sub module defines an object type. Since my map is already pretty complex providing validation blocks (or strict type enforcement idea above) are a way to ensure users arent passing inappropriate keys or misspelled keys.

Because keys are filtered I cant provide that feedback even through validation blocks.

apparentlymart commented 2 years ago

An existing way to be more strict than Terraform would naturally be is to disable Terraform's normal constraint/conversion behavior by setting the constraint to just any, and then implementing whatever constraints you want using validation blocks.

For example:

variable "example" {
  type = any

  validation {
    # There are a few different ways to express this rule.
    # I'm doing it with the set arithmetic functions because
    # that's intuitive to me and this is just one example.
    condition = (length(setsubtract(keys(var.example), ["a", "b"])) == 0)

    error_message = "Only attributes 'a' and 'b' are allowed."
  }
}

Terraform provides this escape hatch as a measure of pragmatism, since we can't possibly support all use-cases using specialized features, but I would suggest caution for the same reasons that motivated Terraform's design already, which I'll try to summarize here because if nothing else I hope they'll be useful to a future reader considering ways that we might support the use-case described by this issue.

An important part of building modular software is carefully designing the components to allow for unanticipated future change. As authors of libraries (where a Terraform module is, for my purposes here, equivalent to a library) we typically concern ourselves with the tension between giving good feedback to current users while still leaving room to grow in future as new requirements emerge.

One way to think about that is to consider what a library requires and what it provides. A backward-compatible change is one that either requires less or that provides more. If a new version requires more or provides less, it's a breaking change.

The Terraform language, in common with various other languages that are either explicitly dynamic or provide reflection capabilities, includes features that if considered very strictly make it very hard to make any change that isn't "technically" a breaking change for some hypothetical caller. For example, the language allows you to observe exactly which attributes are set in an object using functions like keys and using for expressions, and so by a strict interpretation we might say that a module requires exactly the set of input variables it was defined to take on day 1, and it provides exactly the set of output values it was defined to take on day 1.

However, that is not a pragmatic definition of "breaking change" and so the type system and idiom are defined such that in the common case it's possible for future versions of a module to require less and provide more in ways that won't affect "reasonable use" (which, of course, has a subjective definition) of the module.

The behavior we're discussing in this issue is a key part of that tradeoff. When using module composition patterns, or similar designs where whole resource objects pass into modules, the two components should ideally be able to evolve independently of one another so that callers don't get into "dependency hell" situations:

module "a" {
  source = "../modules/a"
}

module "b" {
  source = "../modules/a"

  a = module.a
}

Module "A" should be able to provide more by introducing a new output value, but in order for that to not be a breaking change module B must be able to tolerate being passed more than it requires. The Terraform language design addresses this in the manner I described earlier, by having the language runtime automatically convert what the user provided to match the module's description of what it requires, and as a result module authors and provider authors don't consider it a breaking change to add a new attribute to an existing object.

Although less common than the case of module A providing more, it is also possible that a future version of module B could require less. The Terraform language design makes a slightly different tradeoff here as a result of practical experience in early versions of Terraform: the definitions of variables themselves are always static -- there's no way to make a dynamic decision about what attributes to define in that module "b" block, and so no way that adding a new attribute somewhere else could cause a new definition to implicitly appear there. Consequently, Terraform treats defining an undeclared variable as a fatal error. However, this tradeoff does apply to changing the type constraint of an existing input variable: those can be populated dynamically as in the example above, and so it ought to be possible for the author of module "b" to remove a required attribute from the type constraint of its variable "a" without breaking this data flow from module "a" into module "b".

As we've seen above, there are ways to subvert this if you really want to, and thus to build modules where future growth elsewhere is likely to appear as a breaking change. We rely on language design and idiom to implicitly encourage the growth-friendly design and, conversely, to discourage the growth-unfriendly design by making it less convenient.

None of this is meant to say that what Terraform currently does is a perfect reflection of these principles or that there aren't other ways to meet these goals, but I hope it's helpful to understand that the current language design in this area is founded in some practical tradeoffs rather than being arbitrary, and that any design proposals for changes in this area ought to take similar care to consider and discuss their interactions with these principles.

drewmullen commented 2 years ago

disable Terraform's normal constraint/conversion behavior by setting the constraint to just any, and then implementing whatever constraints you want using validation blocks.

This is incredibly helpful to know. I wasn't aware that any adjusts the behavior to that level. Thank you!

Nuru commented 8 months ago

@apparentlymart I don't know where optional object attributes were on the roadmap when you last commented on this, but now that they are a non-experimental part of the language, the suggestion to use type any is no longer acceptable. I have a var like this:

variable "config" {
  type = object({
    mode      = optional(string, "API")
    bootstrap = optional(bool, false)
  })
}

I don't want to use any for several reasons, but among them is that I don't get the desired optional behavior. I want to accept without warning or error

config = { bootstrap = true }

but I want to have a warning or error if I get

config = { moed = "CONFIG_MAP" }

The current behavior is that when the user supplies moed = "CONFIG_MAP" the module gets mode = "API" with no way to warn the user that their input is being ignored.

I see a few ways to enhance Terraform to address this without harming flexibility or compatibility:

Add a new setting (along the lines of nullable = false) that causes a warning or error when attributes are present in the input value that are not declared in the variable type. Maybe undeclared_attributes = ignore | warn | error with a default of ignore matching the current behavior.
Allow a precondition block that has access to the unconverted input type. This is a lot more difficult to use, but still would allow me to go through and check all the object keys and make sure there are no extras.
Provide some sort of backdoor function that gives access to the attributes that were removed during type conversion that can be used in the validation block.

Option 1 seems to be both the best for ease of usage and the least damaging to flexibility and compatibility. As with the introduction of nullable it does not introduce any change to existing code, so all of your examples would continue to work as-is. Also, by limiting the check to undeclared attributes, it still allows the module author to add fields without causing problems with backward compatibility.

I have been bitten by this problem several times already, where I got the name of the element wrong in some way, and got default behavior when I wanted to override it. Even knowing what I know, it is still hard to catch, find, and fix these kinds of errors. Please consider some way to support at least warning about them.

wyardley commented 1 month ago

The "intended" behavior of Terraform is very baffling to me here - one of the whole reasons I'd think you'd use an object vs type any is exactly to be able to more carefully control what can / can't be passed in. Having a flag to enforce strict type checking seems to me like it would be a reasonable compromise. Especially in a case where some or all values are optional (now supported), this can create some weird behavior.

For example, if you want to require one of 3 top level keys, this validates fine with type = map(any), but the behavior is not as I think many would expect with or without the validation block with an object of this form:

variable "foo" {
  type = object({
    thing1 = optional(string)
    thing2 = optional(string)
    thing3 = optional(string)
  })
  default = {}
  validation {
    condition = length(setsubtract(keys(var.secret_version_adders), ["thing1", "thing2", "thing3"])) == 0
    error_message = "Invalid special group members value."
  }
}

Of course, in this simple example, it's easy enough to just relax the type and write custom validation, but it could easily be a much more complicated object.

One other nice thing about using an object is that the expected structure is somewhat more self-documenting than map(any), any, etc.