hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.77k stars 9.12k forks source link

[Bug]: kms_key_id for encrypt_at_rest is trying to force a new resource when enabling encryption at rest for the first time #31748

Open rymancl opened 1 year ago

rymancl commented 1 year ago

Terraform Core Version

v1.4.6

AWS Provider Version

v5.1.0

Affected Resource(s)

Expected Behavior

Enabling encryption at rest for the first time on a domain should not require recreating it

Actual Behavior

Enabling encryption at rest for the first time on a domain is causing Terraform to recreate the resource

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_opensearch_domain" "this" {
  domain_name    = "this"
  engine_version = "Elasticsearch_7.10"

  # most config omitted

  encrypt_at_rest {
    enabled = true
    kms_key_id  =  data.aws_kms_key.rds.arn
  }
}

data "aws_kms_key" "es" {
  key_id = "alias/aws/es"
}

Steps to Reproduce

  1. Set encrypt_at_rest.enabled to true for the first time
  2. init, plan

Debug Output

image

Panic Output

No response

Important Factoids

Also tested on

AWS provider version 4.67.0
Terraform core version 1.3.7

This happens for both aws_opensearch_domain & aws_elasticsearch_domain.

References

Similar to #28321

Would you like to implement a fix?

No

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Thanks for taking the time to split this off @rymancl 🎉 I wanted to leave a couple of notes based on the research I did on #28321.

The older issue was more specifically that toggling encrypt_at_rest.enabled from false to true would trigger replacement under certain circumstances (mainly in the aws_elasticsearch_domain resource), which is why we broke this issue out; this replacement is being caused by aws_opensearch_domain.foo.encrypt_at_rest.kms_key_id.

The kms_key_id argument is currently ForceNew, however, there seems to be some inconsistency that we need to look at to determine if that's appropriate. My findings below:

The opensearchservice.UpdateDomainConfigInput type has a EncryptionAtRestOptions field, which seems to indicate that this can be enabled without recreation. This is backed up by AWS' Encryption of data at rest for Amazon OpenSearch Service document, which has a reference API call to enable encryption at rest:

You can also enable encryption through the configuration API. The following request enables encryption of data at rest on an existing domain:

{
   "ClusterConfig":{
      "EncryptionAtRestOptions":{
         "Enabled": true,
         "KmsKeyId":"arn:aws:kms:us-east-1:123456789012:alias/my-key"
      }
   }
}

However, while the API documentation for UpdateDomainConfig also lists the EncryptionAtRestOptions object, it's documentation states:

Specifies whether the domain should encrypt data at rest, and if so, the Key Management Service (KMS) key to use. Can be used only to create a new domain, not update an existing one.

It's unclear to me whether this should be ForceNew or can be updated in place; I'll leave this open and marked as a bug until we're able to investigate further.

rymancl commented 1 year ago

In order to help test this a bit, I created two new domains that match my existing one.

  1. test-encryption in which I'll try to enable encryption at rest [for the first time] via the console
  2. test-encryption-cli in which I'll try to enable encryption at rest [for the first time] via the AWS CLI

Via console, it let me add encryption at rest without issue. image

Via CLI, it let me add encryption at rest without issue.

aws opensearch update-domain-config --domain-name test-encryption-cli --encryption-at-rest-options Enabled=true,KmsKeyId=REDACTED --profile dev

image

Interestingly, both of these do show a step for "Creating a new environment". image

However, the cluster health always stays green. I suspect this is doing a sort of in-place migration to a new cluster/nodes, whereas what the TF provider is trying to do is actually destroy the domain and make a new one. That would have downtime but the AWS ways do not.

I'm also not sure if any of this logic differs with the Go SDK.

But I suspect that ForceNew is only needed when encrypt_at_rest.enabled is already set as true in state. That implies the domain data is already encrypted and you're re-encrypting with a new key; this jives with the messages presented by AWS when enabling encryption-at-rest for the first time:

Encryption will be enabled Once node-to-node encryption and/or encryption of data at rest are enabled, you will no longer be able to disable or modify them. Enabling encryption may have an impact on performance.

Let me know if I can test any other scenarios. Thanks!

EDIT

And just to confirm, if I don't set kms_key_id it does not try to recreate the resource. image