hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.82k stars 9.17k forks source link

[Bug]: error waiting for KMS External Key valid_to propagation: timeout while waiting for state to become 'TRUE' (last state: 'FALSE', timeout: 5m0s) #27611

Open danushkaf opened 2 years ago

danushkaf commented 2 years ago

Terraform Core Version

1.3.1

AWS Provider Version

4.37.0

Affected Resource(s)

aws_kms_external_key

Expected Behavior

Key getting created successfully with correct valid_to value.

Actual Behavior

Key creation fails saying valid_to propagation timeout.

Relevant Error/Panic Output Snippet

Error: error waiting for KMS External Key (xxx) valid_to propagation: timeout while waiting for state to become 'TRUE' (last state: 'FALSE', timeout: 5m0s)
│
│   with module.example_supporting_module.aws_kms_external_key.this[0],
│   on ..\module\kms.tf line 33, in resource "aws_kms_external_key" "this":
│   33: resource "aws_kms_external_key" "this" {

Terraform Configuration Files


module/kms.tf

data "aws_partition" "current" {}
data "aws_caller_identity" "current" {}

locals {
  aws_ext_keys = coalesce(try([for v in var.kms_key_configs : v if v.create_external], null), [])
}

resource "aws_kms_external_key" "this" {
  count = var.create ? length(local.aws_ext_keys) : 0

  bypass_policy_lockout_safety_check = try(local.aws_ext_keys[count.index].bypass_policy_lockout_safety_check, null)
  deletion_window_in_days            = try(local.aws_ext_keys[count.index].deletion_window_in_days, null)
  description                        = try(local.aws_ext_keys[count.index].description, null)
  enabled                            = try(local.aws_ext_keys[count.index].is_enabled, null)
  key_material_base64                = try(local.aws_ext_keys[count.index].key_material_base64, null)
  multi_region                       = try(local.aws_ext_keys[count.index].multi_region, null)
  policy                             = try(local.aws_ext_keys[count.index].policy, null)
  valid_to                           = try(local.aws_ext_keys[count.index].valid_to, null)

  tags = merge(
    local.kms_tags,
    local.aws_ext_keys[count.index].tags
  )
}

module/variables.tf

variable "tags" {
  type        = map(string)
  description = "(Optional) Map of tags to assign to the resources"
  default     = {}
}

variable "create" {
  description = "Controls if EKS resources should be created (affects nearly all resources)"
  type        = bool
  default     = true
}

variable "use_name_prefix" {
  description = "Determines whether to use `name` as is or create a unique name beginning with the `name` as the prefix"
  type        = bool
  default     = false
}

variable "prefix_separator" {
  description = "The separator to use between the prefix and the generated timestamp for resource names"
  type        = string
  default     = "-"
}

variable "kms_key_configs" {
  description = "Configurations to create kms keys"
  default     = null
  type = any
}

example/main.tf

provider "aws" {}

module "example_supporting_module" {
  source = "../module/"

  kms_key_configs = [
    {
      description         = "External key example"
      create_external     = true
      key_material_base64 = "Wblj06fduthWggmsT0cLVoIMOkeLbc2kVfMud77i/JY="
      valid_to            = "2023-04-12T23:20:50.52Z"

      tags = {
        Terraform = "true"
      }
    }
  ]
}

Steps to Reproduce

Execute terraform apply

Debug Output

No response

Panic Output

No response

Important Factoids

I found few issues on different properties on KMS. Here I am facing the issue on valid_to of external_key

References

https://github.com/hashicorp/terraform-provider-aws/issues/23592 https://github.com/hashicorp/terraform-provider-aws/issues/23136

Would you like to implement a fix?

No response

github-actions[bot] commented 2 years ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

adinemer commented 2 years ago

👍

ghost commented 1 year ago

Any update on this one?

breisig commented 1 year ago

Having same error but with KMS key Policy. It looks like when viewing the KMS key policy in the AWS Web Console and you keep refreshing the page a bunch of times, the key policies order of some of the items randomly change position. [example: when I have IAM policy]. I don't think AWS is consistent with the output if checking for changes.

rbrooks-hubble commented 1 year ago

I'm getting this also now, when updating a policy on a KMS key. It's never happened before now but has happened several times in a row now. The policy does seem to get applied, it just never "finishes"

The error is

│ Error: error waiting for KMS Key (XXXX) policy propagation: timeout while waiting for state to become 'TRUE' (last state: 'FALSE', timeout: 5m0s)
resource "aws_kms_key" "kms" {
  description             = "${var.stack} KMS Key"
  deletion_window_in_days = 30

  policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      # Do not remove!  This allows users to administer the key.
      {
        Sid       = "Enable IAM policies"
        Effect    = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${var.aws_account_id}:root"
        },
        Action   = "kms:*",
        Resource = "*"
      },

      {
        Sid       = "Allow connector-router to send to kms-encrypted sqs queue"
        Effect    = "Allow"
        Principal = {
          "AWS" = "arn:aws:sts::${var.aws_account_id}/ROLENAMEHERE"
        }
        Action = [
          "kms:GenerateDataKey",
          "kms:Decrypt"
        ],
        Resource = "*"
      }
    ]
  })
rbrooks-hubble commented 1 year ago

Any suggestions how to get myself out of this and into a place where I can actually apply terraform again? Could I remove the kms key from my state and import it? Anything else to try?

rustybrooks commented 1 year ago

Any suggestions how to get myself out of this and into a place where I can actually apply terraform again? Could I remove the kms key from my state and import it? Anything else to try?

BTW I did end up doing this, and it did work.

lacevedo2 commented 1 year ago

1) I had the same issue when I read from variable

resource "aws_kms_key" "kms_keys" {
  policy                  =var.policy

2) I had the same issue when I hardcoded to string "policy": "{ \"Statement\": [ { \"Action\": \"kms:*\", \"Effect\": \"Allow\", 3) solution: terraform the policy element creating it using `data "aws_iam_policy_document"

 a), alter the var.policy to something like this (vault variable)
 "policy": {
        "statement-1": {
          "actions": [
            "kms:*"
          ],
          "identifiers": [
            "arn:aws:iam::{youraccountid}:root"
          ],
          "resources": [
            "*"
          ],
          "type": "AWS"
        },
        "statement-2": {
          "actions": [
            "kms:GenerateDataKey*",
            "kms:Decrypt",
          ],
          "identifiers": [
            "events.amazonaws.com"
          ],
          "resources": [
            "*"
          ],
          "type": "Service"
        }
      }

b) terraform code:

`data "aws_iam_policy_document" "kms_key_policy" {
`    dynamic "statement" {
      for_each = var.policy
      content {
        effect                   = "Allow"
        actions                  = statement.value["actions"]
        principals {
          type        =  statement.value["type"]
          identifiers = statement.value["identifiers"]
         }
        resources = statement.value["resources"]

      }
    }
}

b) use your terraform created policy instead of a variable

resource "aws_kms_key" "kms_keys" {
  description             = var.description
  deletion_window_in_days = 30
  policy                  = data.aws_iam_policy_document.kms_key_policy.json
  enable_key_rotation     = true

}
carlosjgp commented 1 year ago

Still happening

terraform version
Terraform v1.4.5
on linux_amd64
provider "registry.terraform.io/hashicorp/aws" {
  version     = "4.67.0"
  constraints = ">= 3.55.0, ~> 4.0"
  hashes = [
    "h1:dCRc4GqsyfqHEMjgtlM1EympBcgTmcTkWaJmtd91+KA=",
    "zh:0843017ecc24385f2b45f2c5fce79dc25b258e50d516877b3affee3bef34f060",
    "zh:19876066cfa60de91834ec569a6448dab8c2518b8a71b5ca870b2444febddac6",
    "zh:24995686b2ad88c1ffaa242e36eee791fc6070e6144f418048c4ce24d0ba5183",
    "zh:4a002990b9f4d6d225d82cb2fb8805789ffef791999ee5d9cb1fef579aeff8f1",
    "zh:559a2b5ace06b878c6de3ecf19b94fbae3512562f7a51e930674b16c2f606e29",
    "zh:6a07da13b86b9753b95d4d8218f6dae874cf34699bca1470d6effbb4dee7f4b7",
    "zh:768b3bfd126c3b77dc975c7c0e5db3207e4f9997cf41aa3385c63206242ba043",
    "zh:7be5177e698d4b547083cc738b977742d70ed68487ce6f49ecd0c94dbf9d1362",
    "zh:8b562a818915fb0d85959257095251a05c76f3467caa3ba95c583ba5fe043f9b",
    "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
    "zh:9c385d03a958b54e2afd5279cd8c7cbdd2d6ca5c7d6a333e61092331f38af7cf",
    "zh:b3ca45f2821a89af417787df8289cb4314b273d29555ad3b2a5ab98bb4816b3b",
    "zh:da3c317f1db2469615ab40aa6baba63b5643bae7110ff855277a1fb9d8eb4f2c",
    "zh:dc6430622a8dc5cdab359a8704aec81d3825ea1d305bbb3bbd032b1c6adfae0c",
    "zh:fac0d2ddeadf9ec53da87922f666e1e73a603a611c57bcbc4b86ac2821619b1d",
  ]
}
poflynn commented 1 year ago

Still happening with v5.17.0 but the timeout is now 10 minutes thanks to #27422

daryl-mcmillan commented 1 year ago

I get this error message when I use role arns that get canonicalized by kms

edit: this may seem a little off-topic here, but a more related issue was closed as a duplicate of this one - https://github.com/hashicorp/terraform-provider-aws/issues/27641#issuecomment-1307833317

daryl-mcmillan commented 1 year ago

The valid_to timeouts appear to be a similar canonicalization problem, where the data stored doesn't quite match the input data, so terraform waits forever for the change to be reflected.

This times out (with fractional seconds):

resource "aws_kms_external_key" "test" {
  description = "dmcmillan-deleteme-test"
  key_material_base64 = "Wblj06fduthWggmsT0cLVoIMOkeLbc2kVfMud77i/JY="
  deletion_window_in_days = 7
  enabled = false
  valid_to = "2023-12-31T10:00:00.52Z"
}

and this applies successfully (no fractional seconds):

resource "aws_kms_external_key" "test" {
  description = "dmcmillan-deleteme-test"
  key_material_base64 = "Wblj06fduthWggmsT0cLVoIMOkeLbc2kVfMud77i/JY="
  deletion_window_in_days = 7
  enabled = false
  valid_to = "2023-12-31T10:00:01Z"
}
JAnderson800 commented 4 months ago

We have run into the same/similar issue with the aws_kms_key_policy resource.

The apply succeeds, despite the error message: timeout while waiting for state to become 'TRUE' (last state: 'FALSE', timeout: 10m0s)

Importing via the import { } TF block was successful, but the next apply results in the plan showing changes. Applything the changes results in the same error. Looking at the change plan:

  ~ Principal = {
      ~ AWS = [
          - "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity1_8a8a8a8a8a8a8a8a",
          - "arn:aws:iam::999999999999:role/RoleID2",
          + "arn:aws:iam::999999999999:role/RoleID3",
            "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity2_8a8a8a8a8a8a8a8a",
          - "arn:aws:iam::999999999999:role/RoleID7",
          + "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity3_8a8a8a8a8a8a8a8a",
            "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity4_8a8a8a8a8a8a8a8a",
          - "arn:aws:iam::999999999999:role/RoleID3",
          + "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity5_8a8a8a8a8a8a8a8a",
            "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity6_8a8a8a8a8a8a8a8a",
          - "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity3_8a8a8a8a8a8a8a8a",
          + "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity1_8a8a8a8a8a8a8a8a",
          + "arn:aws:iam::999999999999:role/RoleID3",
            "arn:aws:iam::999999999999:role/system/RoleID4,
          - "arn:aws:iam::999999999999:role/RoleID5",
          - "arn:aws:iam::999999999999:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_identity5_8a8a8a8a8a8a8a8a",
            "arn:aws:iam::999999999999:role/RoleID6",
          + "arn:aws:iam::999999999999:role/RoleID5",
          + "arn:aws:iam::999999999999:role/RoleID7",
          + "arn:aws:iam::999999999999:role/RoleID2",
        ]
    }

No changes are shown for any other part of the policy, only the principal list.

The plan indicates TF is trying to remove and re-add some of the principals. As a previous poster mentioned, if you refresh the policy in the AWS management portal, the order of the principals changes with each page refresh. Seems to point to the underlying TF relying on the order returned and AWS never returning in the same order. If TF expecting these to be returned in the exact same order as in the plan, I think the chances of the error occurring grow as the number of principals grows.

There are 4 statements in our policy, much like the policy shown in https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-overview.html. The statement with the single Principal never shows a change in the TF plan:

  ~ Statement = [
        {
            Action    = "kms:*"
            Effect    = "Allow"
            Principal = {
                AWS = "arn:aws:iam::999999999999:root"
            }
            Resource  = "*"
            Sid       = "Enable IAM User Permissions"
        },

For the statements with the "changing" list of ARNs:

jmpelaschier commented 4 months ago

@JAnderson800 I had the same issue as you and finally figured out what the problem was. If you look at your plan, there are 8 +s and 7 -s. The role RoleID3 is being added twice. This is very tricky to see since most of the roles are just getting re-ordered in the plan. The problem is, when this policy gets submitted to AWS, it removes the duplicate roles. Terraform however is constantly checking to see if the policy has actually been properly applied. Since the policy in the plan does not match what is being returned from AWS (a policy with just one RoleID3 role instead of 2), terraform waits for 10 minutes and then times out even though everything shows up in the AWS console.

This is very tricky to catch given the error message from Terraform but if you remove the duplicate role, you will no longer face this issue.

JAnderson800 commented 3 months ago

@jmpelaschier - thank you for taking the time to read and suggest. Adding the Terraform distinct() function around the offending code has resolved the issue for now. The internals of what was causing the duplicates is essentially data blocks that pulled overlapping arns so distinct() was the fix.