MSK cluster TLS config getting updated despite no changes

nomeelnoj commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v1.1.9
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.15.0

Affected Resource(s)

aws_msk_cluster

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

resource "aws_msk_cluster" "default" {
  # ...  removed for brevity
  client_authentication {
    tls {
      certificate_authority_arns = var.certificate_authority_arns
    }
  }
}

Expected Behavior

Since the client auth did not change, the plan output should have been a no-op

Actual Behavior

The plan rolled the entire cluster (40 min), updating the TLS cert to the same cert that was already configured

Steps to Reproduce

Create an msk cluster using prior to version 4.12.1 of the AWS provider
Upgrade to higher than version 4.13 of the provider
See that the plan output wants to change the acm PCA even though the values are not different:

Plan output:


Terraform will perform the following actions:

  # module.msk.aws_msk_cluster.msk_cluster will be updated in-place
  ~ resource "aws_msk_cluster" "msk_cluster" {
        id                           = "my cluster arn"
        # (10 unchanged attributes hidden)
      ~ client_authentication {
            # (1 unchanged attribute hidden)

          ~ tls {
              + certificate_authority_arns = [
                  + "my acm pca cert arn",
                ]
            }
        }
        # (5 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Important Factoids

References

21005

madengr00 commented 2 years ago

Seeing a similar issue on our side.
Terraform v1.1.5 on darwin_arm64

provider registry.terraform.io/hashicorp/aws v4.22.0

The plan is indicating that there has been a change on the aws_msk_cluster related to the tls block inside the client configuration. In a previous provider version, there was an issue with null handling (that was fixed), but now, removing the optional tls block that didn't contain any attributes, plans a change, but errors on the tfe apply.

The Error is similar to this: "Error: updating MSK Cluster (arn:aws:kafka:us-east-2::cluster/domain-msk-dev-1/) security: BadRequestException: The request does not include any updates to the security setting of the cluster. Verify the request, then try again. { RespMetadata: { StatusCode: 400, RequestID: "someid" }, Message_: "The request does not include any updates to the security setting of the cluster. Verify the request, then try again." } with aws_msk_cluster.domain-msk-cluster on msk_cluster.tf line 5, in resource "aws_msk_cluster" "stc-ds-msk-cluster": resource "aws_msk_cluster" "stc-ds-msk-cluster" {..."

The terraform block being referenced (before changes): resource "aws_msk_cluster" "domain-msk-cluster" { broker_node_group_info { az_distribution = "DEFAULT" client_subnets = var.vpc_subnets[var.envr] storage_info { ebs_storage_info { volume_size = 1000 } } instance_type = "kafka.m5.large" security_groups = var.vpc_security_groups[var.envr] }

client_authentication { sasl { iam = "true" scram = "true" } tls { certificate_authority_arns = null } }

(Notice the tls block with the null certificate_authority_arns)

Changes made where the apply error still occurs:

resource "aws_msk_cluster" "domain-msk-cluster" { broker_node_group_info { az_distribution = "DEFAULT" client_subnets = var.vpc_subnets[var.envr] storage_info { ebs_storage_info { volume_size = 1000 } } instance_type = "kafka.m5.large" security_groups = var.vpc_security_groups[var.envr] }

client_authentication { sasl { iam = "true" scram = "true" } }

michalschott commented 2 years ago

I'm also affected:

➜  terragrunt version
Terraform v1.1.9
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.22.0

Code snippet:

resource "aws_msk_cluster" "this" {
...
  client_authentication {
    unauthenticated = var.client_authentication_unauthenticated

    sasl {
      iam = true
    }
  }
...

Did some changes regarding auth to cluster in AWS console, plan shows:

 # module.msk.aws_msk_cluster.this[0] will be updated in-place
  ~ resource "aws_msk_cluster" "this" {
        id                           = "XXX"
        tags                         = {}
        # (12 unchanged attributes hidden)
      ~ client_authentication {
            # (1 unchanged attribute hidden)
          - tls {}
            # (1 unchanged block hidden)
        }
        # (6 unchanged blocks hidden)
    }

Applying:

│ Error: updating MSK Cluster (XXX) security: BadRequestException: The request does not include any updates to the security setting of the cluster. Verify the request, then try again.
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "8942deea-e20e-4826-aa3d-2218b7fee4ac"
│   },
│   Message_: "The request does not include any updates to the security setting of the cluster. Verify the request, then try again."
│ }

mattthaber commented 2 years ago

Also affecting us ...

terraform definition


resource "aws_msk_cluster" "msk" {
...
  client_authentication {
    sasl {
      iam   = false
      scram = false
    }

    tls {
      certificate_authority_arns = []
    }
...
}

terraform plan

  ~ resource "aws_msk_cluster" "msk" {
        tags                         = {
            "Name" = "stg-hogwarts-msk"
        }
        # (11 unchanged attributes hidden)

      ~ client_authentication {
            # (1 unchanged attribute hidden)

          + tls {}
            # (1 unchanged block hidden)
        }

        # (5 unchanged blocks hidden)
    }

Terraform apply

│ Error: updating MSK Cluster (arn:aws:kafka:us-west-2:***:cluster/stg-hogwarts-msk/786d7a2b-21d9-4e09-8b02-eee8a1d8ea24-2) security: BadRequestException: The request does not include any updates to the security setting of the cluster. Verify the request, then try again.
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "ccb49f64-a837-4e9d-9e6f-5c10c6ec630a"
│   },
│   Message_: "The request does not include any updates to the security setting of the cluster. Verify the request, then try again."
│ }
│ 
│   with module.msk.aws_msk_cluster.msk,
│   on terraform-modules/msk/main.tf line 50, in resource "aws_msk_cluster" "msk":
│   50: resource "aws_msk_cluster" "msk" {
│ 
╵

mattthaber commented 2 years ago

Not a real solution, but we unblocked this by adding a ignore_changes to client_authentication. Was only way to continue with our deploys...


resource "aws_msk_cluster" "msk" {

  lifecycle {
    ignore_changes = [
      client_authentication,
    ]
  }
}

AryaCherryLiu commented 1 year ago

also facing this issue. I didn't make any change for msk, but still have output about msk in terraform plan Terrraform plan:

# aws_msk_cluster.cluster will be updated in-place
  ~ resource "aws_msk_cluster" "cluster" {
        id                           = "arn:aws:kafka:us-west-2:******:*****"
        tags                         = {}
      ~ client_authentication {
          - tls {}
        }
    }

error output:

panda commented 1 year ago

Also experiencing this same behavior

pavleprica commented 1 year ago

After a decent amount of debugging it turned out that the error was on our side.

This doesn't mean that goes for everyone, but I suspect it's quite possible.

So, since AWS MSK offers several types of auth. IAM, TLS, Unauthenticated, etc... you have to be careful when initially setting the configuration for the auth. I can't remember exactly what was the error we made, but I think it was along the lines of setting

client_sasl_iam_enabled = true

without setting

client_allow_unauthenticated = false

or using those two combined. Generally what usually happens is that you quite easily provide two different auth configs and later on even if it manages the first creation on next planning it wants to update with different auth which causes it to crash.

(I said I can't remember exactly because the debugging session was about a month and more ago 😅 )

renyu-capsule commented 1 year ago

having same issue, happened after we tried to enable SCRAM and turned it back off

security: BadRequestException: The request does not include any updates to the security setting of the cluster. Verify the request, then try again.

knightsg commented 1 year ago

I've just run into this same issue myself. If you have both "unauthenticated" and "sasl" options in the client_authentication block and there are no changes to either, terraform will end with a "The request does not include any updates to the security setting of the cluster. Verify the request, then try again." error, which is pretty annoying.

Rooks103 commented 1 year ago

I'm basically seeing the same thing as @michalschott.

Steps I did were that I had a bunch of MSK clusters with just scram = true. I did an experiment on one and enabled IAM via the console. Played around with it, then removed IAM from the cluster via the console.

I then updated my client_authentication block to include iam = true in my MSK module. All of the clusters then updated fine with the exception of the one where I did my testing. This cluster is now showing that it wants to make this change:

!       client_authentication {
            # (1 unchanged attribute hidden)

-           tls {}

            # (1 unchanged block hidden)

but reports The request does not include any updates to the security setting of the cluster. error when I try to apply. I even checked the state files to see if there was a difference on the TLS settings between a good and bad cluster, but nothing stood out to me.

So it would seem to have something to do with making changes via the Console at least in my instance. This is with TF version 1.3.9 and AWS provider version 4.55.0.

quercusilvam commented 1 year ago

I've done some testing and I have some interesting findings. This error only occurs if you add some authentication configuration (manually via console or by using terraform) and later remove them.

I've created a cluster with such authentication settings: Terraform code:

client_authentication {
    tls {
      certificate_authority_arns = local.pca_arn
    }
    unauthenticated = false
  }

That works fine. In my state file I can see this (note: empty sasl): State file:

"client_authentication": [
  {
    "sasl": [],
    "tls": [
      {
        "certificate_authority_arns": [
          "arn:aws:acm-pca:eu-west-1:XXXXXX:certificate-authority/123456789"
        ]
      }
    ],
    "unauthenticated": false
  }
],

Next I added sasl entry: Terraform code:

client_authentication {
    tls {
      certificate_authority_arns = local.pca_arn
    }
    sasl {
      iam = true
    }
    unauthenticated = false
  }

This works fine. My state looks now like this: State file:

"client_authentication": [
  {
   "sasl": [
      {
        "iam": true,
        "scram": false
      }
    ],
    "tls": [
      {
        "certificate_authority_arns": [
          "arn:aws:acm-pca:eu-west-1:XXXXXX:certificate-authority/123456789"
        ]
      }
    ],
    "unauthenticated": false
  }
],

I reverted the change and remove the sasl/iam config from cluster. After that my state file looks like this State file:

"client_authentication": [
  {
    "sasl": [
      {
        "iam": false,
        "scram": false
      }
    ],
    "tls": [
      {
        "certificate_authority_arns": [
          "arn:aws:acm-pca:eu-west-1:XXXXXX:certificate-authority/123456789"
        ]
      }
    ],
    "unauthenticated": false
  }
],

Note that now there are explicit false values for iam & scram. They were added because I've changed one of this setting (no matter if by using AWS console or terraform code). But there is no sasl block in my terraform code, so provider expect null values. But AWS return false instead. Because of that next plan/apply tries to 'fix AWS':

~ client_authentication {
    # (1 unchanged attribute hidden)

  - sasl {
      - iam   = false -> null
      - scram = false -> null
    }

    # (1 unchanged block hidden)
}

Which cannot be done. As a result we've got never-ending update.

org-ci-cd commented 1 year ago

Similar issue using EBS provisioned throughput

│ Error: updating MSK Cluster (<arn>) broker storage: BadRequestException: The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again.
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "<abc>"
│   },
│   Message_: "The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again."
│ }

vishwa-trulioo commented 1 year ago

I get the feeling that the only way to work this out to rewrite the resource and break it into smaller resources. Just like how they did it with S3 sometime ago. :-(

ghost commented 1 year ago

For us we still wanted to be able to change the sasl settings and setting lifecycle_ignore on the whole of client_authentication didn't work for us. So we used this syntax to just ignore the tls block

  lifecycle {
    ignore_changes = [
      # this is the tls setting
      # https://github.com/hashicorp/terraform-provider-aws/issues/24914
      client_authentication[0].tls
    ]
  }

jasonstitt commented 1 year ago

This issue with provisioned_throughput was entirely blocking for me until ignoring the section:

  lifecycle {
    ignore_changes = [
      broker_node_group_info[0].storage_info[0].ebs_storage_info[0].provisioned_throughput
    ]
  }

hashicorp / terraform-provider-aws