hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
41.99k stars 9.47k forks source link

Warning on Duplicate keys in a map object. #28727

Open techdragon opened 3 years ago

techdragon commented 3 years ago

Current Terraform Version

Terraform v0.15.0
on darwin_amd64

Use-cases

Avoiding duplicate keys in maps.

Attempted Solutions

Linter scripts and third party tools.

Proposal

Duplicate keys in maps should raise some kind of warning. This would help prevent mistakes when working with larger and more complex maps. In particular dynamic maps and merged maps where the keys may not be visible in the terraform configuration as written, only being populated when terraform is run.

References

N.A.

myronmeier commented 3 years ago

Thanks for raising this, @techdragon, literally days before we ran into the same issue. Upvoted! I would even go so far as to propose it be an error and not just a warning.

Here is a simple demonstration.

terraform {
  required_version = "0.15.4"
}

locals {
  local_dbs = {
    db1 = { a = "1a" }
    db1 = { a = "2a" }    // notice the key here is the same as the previous (e.g. a mistake)
    db3 = { a = "3a" }
  }
}

output locals_map_out {
  value = local.local_dbs
}

The above produces these outputs

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

locals_map_out = {
  "db1" = {
    "a" = "2a"
  }
  "db3" = {
    "a" = "3a"
  }
}

Notice the loss of the 1st db1 map entry. There was no error and no warning. It just silently replaced the 1st map entry with the 2nd.

The same can be demonstrated with input variables as well

variable "var_dbs" {
  type = map(object({
    a = string
  }))
}

output variable_map_out {
  value = var.var_dbs
}

With this terraform.tfvars

var_dbs = {
  db1 = { a = "1a" }
  db1 = { a = "2a" }
  db3 = { a = "3a" }
}

Produces

Outputs:

variable_map_out = tomap({
  "db1" = {
    "a" = "2a"
  }
  "db3" = {
    "a" = "3a"
  }
})

Adding more outputs to the locals example above to see if the 1st map entry is just somehow hidden indicates that it was likely lost while constructing the map itself. This similar older issue #18573 (circa TF 11?) seems to confirm that, but it also makes a prediction about duplicate map keys being an error in future versions (TF 12?), which appears to have not come true.

output locals_map_length {
  value = length(local.local_dbs)
}

output locals_map_keys {
  value = keys(local.local_dbs)
}

output locals_map_values {
  value = values(local.local_dbs)
}

output locals_map_json {
  value = jsonencode(local.local_dbs)
}
Outputs:

locals_map_json = "{\"db1\":{\"a\":\"2a\"},\"db3\":{\"a\":\"3a\"}}"
locals_map_keys = [
  "db1",
  "db3",
]
locals_map_length = 2
locals_map_out = {
  "db1" = {
    "a" = "2a"
  }
  "db3" = {
    "a" = "3a"
  }
}
locals_map_values = [
  {
    "a" = "2a"
  },
  {
    "a" = "3a"
  },
]

Even reading from JSON silently drops the duplicate key without error or warning

locals {
  local_json = jsondecode( <<-EOT
    {
      "db1": { "a": "1a" },
      "db1": { "a": "2a" },
      "db3": { "a": "3a" }
    }
    EOT
  )
}

output locals_json_out {
  value = local.local_json
}
Outputs:

locals_json_out = {
  "db1" = {
    "a" = "2a"
  }
  "db3" = {
    "a" = "3a"
  }
}
myronmeier commented 3 years ago

@techdragon , I'm curious what linters you tried. I quickly tried https://github.com/terraform-linters/tflint but it didn't catch the duplicate keys in any of the examples above.

techdragon commented 3 years ago

@myronmeier Thanks for writing out the excellent examples! And yes I had the same issue tflint not catching it, but since the issue goes beyond the maps as written statically in code, I just mentioned the linters in passing as I had tried them as a workaround, but I probably should have been more clear about what I meant.

nwsparks commented 3 years ago

@myronmeier I agree, this should be an error. Warning is not sufficient as it's easily missed and this can be very problematic.

From what I can tell this can't be caught by variable validation either.

nsparks@w10-vm:/mnt/c/test$ cat data.tf
variable "routing_rules" {
  default =   {
    "8000" = {
      host_headers = ["h1"]
    },
    "8000" = {
      host_headers = ["h2"]
    },
    "8001" = {
      host_headers = ["h3"]
    }
  }

  validation {
    condition = length(keys(var.routing_rules)) == 3
    error_message = "Length is not 3."
  }
}
nsparks@w10-vm:/mnt/c/test$ terraform console
╷
│ Warning: Due to the problems above, some expressions may produce unexpected results.
│
│
╵

╷
│ Error: Invalid value for variable
│
│   on data.tf line 1:
│    1: variable "routing_rules" {
│
│ Length is not 3.
│
│ This was checked by the validation rule at data.tf:14,3-13.
╵
techdragon commented 3 years ago

@nwsparks I didn't expect validation to fail like that... Also I'm not 100% convinced that it should always be an error. I can imagine there are users out there who may be relying on this to power various workarounds for providing default variable values that may have been inherited from older terraform modules or when needing to pass in configuration and merge defaults with more complex map related values. While it might be good to make this an error in time, I'd rather have the warning added now, with perhaps some way to configure this warning as an error, than navigate the longer path of changing behaviour like this now that 1.0 released and the terraform team have more to worry about with respect to long term compatibility.

Edit: Just found another example of duplicated key names that may become a common pattern in future that could conflict with making duplicate keys an error https://github.com/hashicorp/terraform/pull/25088 has an example where the key is defined multiple times for running different validation checks against its value.

philip-harvey commented 2 years ago

I struck this today, and it's very dangerous. In terraform.tfvars it will happily let you define a map with the same key multiple times as per the example above and doesn't throw an error, but it discards all but one of the map objects with that key. This can lead to non-recoverable data loss. I tried to add a test to check for duplicate keys, but it seems to discard the data right when it reads it in so there doesn't seems to be any way to test for this inside Terraform. I suggest that Terraform should error if you have multiple occurrences of the same key in a map in terraform.tfvars.

fernando-villalba commented 2 years ago

This one is really bizarre that you can have duplicate keys for objects and maps in terraform as it can be a considerable to debug headache and potentially catastrophic if not careful. Neither Python, Java or Golang accept duplicate keys for maps so TF accepting them is just strange

What's even stranger is how very few downvotes this has! I hope it gets addressed soon.

ams1 commented 1 year ago

For anyone interested in workarounds for this, I opened a SO question: https://stackoverflow.com/questions/74233973/terraform-map-alternative-when-duplicated-keys-should-raise-error/74234215#74234215

naitmare01 commented 1 year ago

I managed to solve it by first creating a list of items containing my key-value pair in my map. Then I converted it to a map, the conversion itself will check for duplicates and error out if any exist.

locals{ tuple_items = [ { "AB" = { ID = "123" } }, { "CD" = { ID = "456" } } ] tuple_to_map = { for item in local.tuple_items : keys(item)[0] => values(item)[0] } # Transform the tuple to a map to be able to iterate over it, output errors if not all keys are unique }

scottpiper-wiz commented 1 year ago

This ticket was referenced recently in this blog post which identified a security issue that results from this functionality: https://securitylabs.datadoghq.com/articles/exploring-github-to-aws-keyless-authentication-flaws/

When you create an AWS IAM role in terraform with a trust policy that contains two StringEquals elements, this results in one of the element disappearing when the IAM role is created by terraform. These StringEquals elements are required to restrict who can access this role. In the example provided, the role is being created for a Github Action, but without both restrictions, the researcher identified that a number of these roles can be assumed by any Github Action in any repository. The researcher found a number of organizations were vulnerable to this issue, including a UK government org.

image

In AWS, it is not possible to have the two StringEquals elements, and they must instead be combined into a single element. There are unfortunately some guides online that show this incorrect terraform code (which I'm working to get changed). Because terraform silently swallows one of the elements with an important restriction, the users are not aware that they are deploying IAM roles which anyone can assume (thereby gaining access to their AWS environments).

It looks like there is more recent discussion of this problem in #33570, where @apparentlymart stated:

I don't see a path to get there within the Terraform v1.x Compatibility Promises

If that is the case, and this will not be fixed in terraform, then should a check for this be added to tflint instead or is there a better place? https://github.com/terraform-linters/tflint

I am additionally worried that beyond Github Actions and AWS IAM Roles, that the problem of silently swallowing security restrictions on resources has impact beyond just that use case.

naitmare01 commented 1 year ago

I am additionally worried that beyond Github Actions and AWS IAM Roles, that the problem of silently swallowing security restrictions on resources has impact beyond just that use case.

I can only agree with the last paragraph @scottpiper-wiz after reading both your comment and also the datadog article you linked.

I highly doubt that this is limited to GitHub action and AWS IAM Roles.

I don’t think this belong in a linter, this should be caught directly in Terraform thus preventing anyone actually deploying this kind of code. Like my own work around in locals. See my previous comment in this thread. The least Terraform should do is to catch and inform of which key in the map it disregards.

scottpiper-wiz commented 1 year ago

In tracking down how this could be implemented, my understanding is jsonencode calls directly to https://github.com/zclconf/go-cty as shown here: https://github.com/hashicorp/terraform/blob/919e62089b4886c0f090da757ee28ed9b8f3a2f0/internal/lang/functions.go#L86

This then calls into json.Marshall as shown here: https://github.com/zclconf/go-cty/blob/7dcbae46a6f247e983efb1fa774d2bb68781a333/cty/function/stdlib/json.go#L59C15-L59C27

In encoding/json, a proposal for Decoder.DisallowDuplicateFields has been accepted here: https://github.com/golang/go/issues/48298#issuecomment-953178062

But the functionality into golang has not yet been merged.

Implementing this could therefore be done by:

apparentlymart commented 1 year ago

It's been a long time since I considered this issue and so I honestly don't really remember all of what I was considering when I responded before.

My interpretation of the original idea here is that it should be an error or warning if an object constructor expression (which we might also casually call an "object literal") can see that two of the key/value pairs have key expressions that evaluate to equal values. I have a faint memory of originally having implemented it that way in my own codebase "zcl" that later became HCL 2 after HashiCorp hired me, and then having to relax that rule in response to some compatibility problem relative to Terraform v0.11. I don't remember exactly what the deal was with that, though. (Object-constructing for expressions do still have this uniqueness restriction, leaving the language slightly inconsistent in this regard.)

Ultimately the handling of object constructor expressions belongs to HCL rather than to Terraform, and so if we are going to change something here I expect it would need to be an HCL change rather than a Terraform change. That presents two challenges which were perhaps what I was reacting to in earlier discussion:

I don't think it's possible to solve this at the application layer inside Terraform -- including via Terraform's built-in functions -- because by the time Terraform is holding the evaluated object result the duplicate attributes have already been discarded.

With all of that said then, I think treating this as a "lint"-like check is probably the most viable path given the existing constraints. The third-party tflint could potentially do it, although we don't have direct influence over that tool. It could also potentially live as part of https://github.com/hashicorp/terraform/pull/29661, but that idea seems to have attracted very little interest so I don't expect I'll be given permission to work on it further for the foreseeable future.

At this point I don't think there is any clear path to resolving this issue. We can keep it open to track the concern, but I don't expect anything further to happen here unless someone has a concrete proposal for what to change, how to change it, and in particular how to do it without breaking compatibility promises or impacting other callers of HCL.

nwsparks commented 1 year ago

It's difficult to imagine that anyone has intentionally created a map with duplicate keys expecting that earlier entries will be discarded, but I guess anything is possible. This does seem to me as more of a bug or security issue rather than a feature which must be preserved.

https://developer.hashicorp.com/terraform/language/v1-compatibility-promises#pragmatic-exceptions the promise does have exceptions that I could see this classifying under.

If there is no way to resolve this in 1.x could it not just be targeted as a fix in 2.0, hopefully altering the behavior so that it returns an error?

stefansedich commented 11 months ago

Another +1 for this being an error, our team ran into this one today where a later definition overrode permissions from an earlier definition.

nhooey commented 7 months ago

@nwsparks said:

It's difficult to imagine that anyone has intentionally created a map with duplicate keys expecting that earlier entries will be discarded, but I guess anything is possible. This does seem to me as more of a bug or security issue rather than a feature which must be preserved.

I agree with the sentiment that this issue should be interpreted as a bug rather than a behaviour worth preserving.

This issue also emerges when converting JSON to HCL with json2hcl tool (which uses HCL v1 instead of v2), since the HCL library it uses is likely converting JSON key-to-list-of-maps assignments to duplicate HCL variable assignments, shown in this example:

$ echo '{"Simple List": [1, 2, 3], "List of Maps": [{"k1": 1, "k2": 2}, {"k3": 3}]}' | json2hcl -

"Simple List" = [1, 2, 3]

"List of Maps" = {
  "k1" = 1
  "k2" = 2
}

"List of Maps" = {
  "k3" = 3
}

This can hardly be desirable behavior, given that the last assignments of "List of Maps" will get discarded (as discussed in previous comments), and especially since Terraform has the ability to properly interpret lists with a perfect 1:1 mapping of JSON to HCL.

Specifically, Terraform can interpret these HCL lists without issue and encode it to JSON:

"Simple List" = [1, 2, 3]

# Notably: an actual list instead of duplicate HCL variable assignments:
"List of Maps" = [
  {
    "k1" = 1
    "k2" = 2
  },
  {
  "k3" = 3
  },
]
rpuserh commented 7 months ago

Same story. I think this is a bug as no any other language accept this including terraform vars. for some reason locals in terraform do. How do we escalate this issue? As I see it was open since May 2021 ?

locals {
  azure_data_map = {
    development = {
      location = "West Europe"
      size     = "small"
    }
    production = {
      location = "Central US"
      size     = "medium"
    }
    production = {                      // Note duplicate key
      location = "Central US"
      size     = "medium1"
    }
  }
}

output "outtest" {
  value = local.azure_data_map
}
Outputs:

outtest = {
  "development" = {
    "location" = "West Europe"
    "size" = "small"
  }
  "production" = {
    "location" = "Central US"
    "size" = "medium1"
  }
}
Shocktrooper commented 2 months ago

Just ran into this where no warning or error was generated for duplicate map keys added by different developers and one entry into the map was hiding values from the true map entry. I am surprised that terraform acts this way and hides this error as map types almost universally across languages and even mathematics has the key value be unique in a singular map

yermulnik commented 4 weeks ago

TF 1.9.2 — just ran into this too and was quite surprised TF doesn't throw error.

ericrichtert commented 4 days ago

tflint-ruleset-terraform 0.9.0 and above has a rule for duplicate keys in a map. See https://github.com/terraform-linters/tflint-ruleset-terraform/pull/194.

But this one doesn't seem to handle .tfvars files

lra commented 4 days ago

tflint-ruleset-terraform 0.9.0 and above has a rule for duplicate keys in a map. See terraform-linters/tflint-ruleset-terraform#194.

But this one doesn't seem to handle .tfvars files

This seems to have been reverted in v0.9.1 https://github.com/terraform-linters/tflint-ruleset-terraform/releases/tag/v0.9.1

ericrichtert commented 4 days ago

As far as I can see, the revert does not handle tfvars files