hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.63k stars 9.01k forks source link

[Bug]: enabling encrypt_at_rest is destroying the existing domain and recreating new one with same name which is causing data loss #28321

Closed Shilpashree-BA closed 1 year ago

Shilpashree-BA commented 1 year ago

Terraform Core Version

1.3.6

AWS Provider Version

4.46.0

Affected Resource(s)

opensearch

Expected Behavior

data loss should not happen

Actual Behavior

Data loss is happening

Relevant Error/Panic Output Snippet

domain is recreating which is causing data loss.

Terraform Configuration Files

provider "aws" {
  access_key = "xxxx"
  secret_key = "xxxx"
  region     = "us-east-1"
}

resource "aws_elasticsearch_domain" "es" {
  domain_name           = "opensearch-test"
  elasticsearch_version =  "OpenSearch_1.3"

  cluster_config {
    instance_count           = 2
    instance_type            = "t3.medium.elasticsearch"
    dedicated_master_enabled = false
    dedicated_master_type    = "t3.medium.elasticsearch"
    dedicated_master_count   = 0
    zone_awareness_enabled   = true
  }

  snapshot_options {
    automated_snapshot_start_hour = 23
  }

  ebs_options {
    ebs_enabled = "true"
    volume_size = 100
    volume_type = "gp2"
  }

  encrypt_at_rest {
    enabled = "true"
  }

  cognito_options{
     enabled = "true"
     identity_pool_id = "us-east-1:34ed7bfb-5dd0-4006-9e7a-xxx"
     user_pool_id = "us-east-xxx"
     role_arn = "arn:aws:iam::xxxxx:role/service-role/CognitoAccessForAmazonOpenSearch"
  }

  advanced_options = {
    "rest.action.multi.allow_explicit_index" = "true"
  }

  access_policies = <<CONFIG
 {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "es:*",
            "Principal": {
                "AWS": [
                      "arn:aws:iam::xxxx:root",
                      "arn:aws:iam::xxxx:role/service-role/CognitoAccessForAmazonOpenSearch",
                      "arn:aws:iam::xxxxx:role/Cognito_pool2Auth_Role"
                       ]
            },
            "Effect": "Allow",
            "Resource": "arn:aws:es:us-east-1:xxxxx:domain/opensearch-test/*"
        }
    ]
 }
CONFIG

  tags = {
    Domain = "TestDomain"
  }

}

Steps to Reproduce

1)create a domain without encrypt_at_rest 2) then add snippet to enable encrypt at rest encrypt_at_rest { enabled = "true" }

3) which is destroying and creating a new domain, thus causing data loss

Debug Output

image

Panic Output

No response

Important Factoids

No response

References

according to the documentation dataloss should not have happened in any versions of opensearch and elasticsearch>=6.7 https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearch_domain

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @Shilpashree-BA πŸ‘‹ Thank you for taking the time to raise this! It looks like you linked to the aws_opensearch_domain documentation, however, in your example, you're using the similar, but distinct aws_elasticsearch_domain resource.

The aws_elasticsearch_domain resource only calls out being able to enable encrypt_at_rest without recreation for ElasticSearch versions 6.7 or later, but doesn't seem to indicate the same for OpenSearch versions.

I suspect you'll want to switch over to using the aws_opensearch_domain resource, which should prevent this from happening.

Shilpashree-BA commented 1 year ago

Hi @justinretzolk Thanks for the response. But even if i use elastic search document says there won't be data loss for elastic search 6.7 or greater version. in my case i have mentioned version as opnesearch_1.3 which is equivalent to elastic search 7.1 still seeing data loss, can you please suggest here. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticsearch_domain#encrypt_at_rest

image

Shilpashree-BA commented 1 year ago

Hi @justinretzolk Thanks for the response. But even if i use elastic search document says there won't be data loss for elastic search 6.7 or greater version. in my case i have mentioned version as opnesearch_1.3 which is equivalent to elastic search 7.1 still seeing data loss, can you please suggest here. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticsearch_domain#encrypt_at_rest

image

ZhouLihua commented 1 year ago

Hello Guys, I have the same issue too.

my situation is that, Terraform Core Version

1.3.2 AWS Provider Version

4.46.0 Affected Resource(s)

opensearch Expected Behavior

data loss should not happen Actual Behavior

Data loss is happening Relevant Error/Panic Output Snippet

Steps:

  1. I create a CMK with additional permissions

  2. create opensearch domain (1.3version), encrypt_at_rest { enabled = "true", kms_key_id = "step1 create key id" }

  3. update setp1 kms key additional permission

result: the opensearch domain will be deleted and recreate.

expect result: the opensearch domain have no change.

justinretzolk commented 1 year ago

Hey @Shilpashree-BA and @ZhouLihua πŸ‘‹ When using OpenSearch, the aws_opensearch_domain resource is preferred over the older aws_elasticsearch_domain resource, as while the resources are similar and can technically both be used to manage OpenSearch resources, there are some subtle differences between the two resources that cause potentially unexpected behavior such as this.

In this particular instance, since the encrypt_at_rest resource can only be updated in a non-destructive way on Elasticsearch version 6.7 and later, the aws_elasticsearch_domain resource looks at the elasticsearch_version parameter is compared to see if it is equal or greater than 6.7 -- it does not determine equivalency between Elasticsearch versions vs OpenSearch versions, so whether OpenSearch 1.3 is the same as Elasticsearch 7.1 is not taken into account. On the other hand, the aws_opensearch_domain resource uses a slightly different method to determine whether encrypt_at_rest can be enabled without resource recreation that takes into account the OpenSearch version.

Can you try moving over to the aws_opensearch_domain resource over the aws_elasticsearch_domain resource to verify that that resolves the issue?

carlvitzthum commented 1 year ago

@justinretzolk Could you please provide some guidance on the following?

Can you try moving over to the aws_opensearch_domain resource over the aws_elasticsearch_domain resource ...

I am attempting to migrate my older aws_elasticsearch_domain resources to the new aws_opensearch_domain but can't figure how to do this without tearing down my old domain, which is in production use. Is there some trick using terraform import?

The question was also asked (but never answered) on the Hashicorp forum: https://discuss.hashicorp.com/t/guidance-for-migrating-from-aws-elasticsearch-domain-to-aws-opensearch-domain/42832

justinretzolk commented 1 year ago

Hey @carlvitzthum πŸ‘‹ To migrate over, you'd need to modify your configuration to replace the aws_elasticsearch_domain resource with a aws_opensearch_domain resource that matches the configuration. Once you've done that, you can use terraform state rm to remove the aws_elasticsearch_domain resource from your state and then use the appropriate terraform import command to import the aws_opensearch_domain resource into your state.

I'm not aware of any "gotchas" with migrating between these particular resources, but always suggest that you make a backup of your state prior to doing any sort of state modification like this, just in case things don't go as planned (particularly since you mentioned this is in production).

rymancl commented 1 year ago

@justinretzolk - I'm adding another data point here since the OP hasn't replied.

Migrating from aws_elasticsearch_domain to aws_opensearch_domain did NOT resolve the issue for me.

AWS provider version 4.67.0
Terraform core version 1.3.7

I removed the aws_elasticsearch_domain from state and made the few adjustments to aws_opensearch_domain and imported into that. No issues there, worked perfectly.

I'm using:

engine_version = "Elasticsearch_7.10"

Plan shows the following: image

This is the same as I saw prior to migrating from aws_elasticsearch_domain to aws_opensearch_domain.

I'm happy to provide any more info that could be useful, just let me know. Thanks!

EDIT

I've since upgraded to latest versions

AWS provider version 5.1.0
Terraform core version 1.4.6

and I'm still seeing the same issue.

justinretzolk commented 1 year ago

Hey @rymancl πŸ‘‹ It looks like you're experiencing a slightly different situation. In the OP's case, encrypt_at_rest.enabled was triggering replacement, as that argument was a ForceNew operation for aws_elasticsearch_domain. That argument is not ForceNew for aws_opensearch_domain, so the migration to the newer resource was intended to help get around that.

In your case, the argument that is changing is encrypt_at_rest.kms_key_id, which is a ForceNew operation, so the resource is behaving as I'd expect. In order to get around the resource being recreated, you'll need to make sure that the KMS key ARN in your configuration matches reality.

rymancl commented 1 year ago

Hey @rymancl πŸ‘‹ It looks like you're experiencing a slightly different situation. In the OP's case, encrypt_at_rest.enabled was triggering replacement, as that argument was a ForceNew operation for aws_elasticsearch_domain. That argument is not ForceNew for aws_opensearch_domain, so the migration to the newer resource was intended to help get around that.

In your case, the argument that is changing is encrypt_at_rest.kms_key_id, which is a ForceNew operation, so the resource is behaving as I'd expect. In order to get around the resource being recreated, you'll need to make sure that the KMS key ARN in your configuration matches reality.

I'm blind, very sorry about that!

I'm enabling encryption at rest for the first time on this domain. I wasn't expecting it to ForceNew since it seems you can enable encryption with a KMS key via the console without destruction. If I were re-encrypting using a new KMS key, yes definitely recreate the domain.

justinretzolk commented 1 year ago

@rymancl Happens to the best of us πŸ™‚. I think that based off of what you've mentioned so far, this might warrant a new issue so that we can capture all of the relevant details. On a quick poke around, I suspect there might be more to look into, and I don't want it to get lost in the thread of this particular issue since we've determined they're distinct. Would you be up for opening a new issue with the relevant details that the bug template requests?

justinretzolk commented 1 year ago

With the other issue broken out, and the screenshots provided by @rymancl indicating that the enabled argument does not force new, we'll close this issue out and focus on the kms_key_id part in the new issue.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.