Terraform seems to ignore "skip_final_snapshot" for rds cluster

hashibot commented 6 years ago

This issue was originally opened by @andrei-dascalu as hashicorp/terraform#16861. It was migrated here as a result of the provider split. The original body of the issue is below.

Hello,

I'm not sure if the following is a terraform or aws provider issue but it sure seems like it's an issue as in my case the request behaves as if "skip_final_snapshot" isn't set or it's always true and it's expecting a "final_snapshot_identifier" (though even when provided the error claims it's not there). I've linked a similar issue as well.

Thanks!

Terraform Version

Terraform v0.11.1
+ provider.aws v1.5.0

Terraform Configuration Files

variable "db_password" {
  description = "Database password"
}

variable "db_name" {
  description = "Database name"
}

resource "aws_rds_cluster_instance" "example" {
  count              = 2
  engine             = "aurora"
  identifier         = "aurora-cluster-demo-${count.index}"
  cluster_identifier = "${aws_rds_cluster.example.id}"
  instance_class     = "db.t2.small"
}

resource "aws_rds_cluster" "example" {
  cluster_identifier = "aurora-cluster-demo"
  availability_zones = ["eu-west-1a", "eu-west-1b"]
  database_name      = "${var.db_name}"
  master_username    = "admin"
  master_password    = "${var.db_password}"
  skip_final_snapshot = true
  final_snapshot_identifier = "whatever"
}

Debug Output

https://gist.github.com/andrei-dascalu/9eaaa3491b5ad77f7e4f8c4f11dc960a

Expected Behavior

RDS Cluster with 2 instances created

Actual Behavior

Error: 1 error(s) occurred:

aws_rds_cluster.example (destroy): 1 error(s) occurred:
aws_rds_cluster.example: RDS Cluster FinalSnapshotIdentifier is required when a final snapshot is required

Steps to Reproduce

run: terraform apply

References

hashicorp/terraform#5417

loivis commented 6 years ago

From debug output, skip_final_snapshot was set to false in the beginning and you wanted to change to true and add new instances to the cluster at the same time?

2017/12/06 20:29:29 [TRACE] DiffTransformer: Module: DESTROY/CREATE: aws_rds_cluster.example
  apply_immediately:               "" => "<computed>"
  availability_zones.#:            "3" => "2" (forces new resource)
  availability_zones.1924028850:   "eu-west-1b" => "eu-west-1b"
  availability_zones.3953592328:   "eu-west-1a" => "eu-west-1a"
  availability_zones.94988580:     "eu-west-1c" => "" (forces new resource)
  backup_retention_period:         "1" => "1"
  cluster_identifier:              "aurora-cluster-demo" => "aurora-cluster-demo"
  cluster_identifier_prefix:       "" => "<computed>"
  cluster_members.#:               "0" => "<computed>"
  cluster_resource_id:             "cluster-CEPPSOIGTTLSDYOFWR33MK4NAM" => "<computed>"
  database_name:                   "mydb" => "test" (forces new resource)
  db_cluster_parameter_group_name: "default.aurora5.6" => "<computed>"
  db_subnet_group_name:            "default" => "<computed>"
  endpoint:                        "aurora-cluster-demo.cluster-czvdbs8joxz1.eu-west-1.rds.amazonaws.com" => "<computed>"
  engine:                          "aurora" => "aurora"
  engine_version:                  "5.6.10a" => "<computed>"
  final_snapshot_identifier:       "" => "whatever"
  kms_key_id:                      "" => "<computed>"
  master_password:                 "<sensitive>" => "<sensitive>" (attribute changed)
  master_username:                 "admin" => "admin"
  port:                            "3306" => "<computed>"
  preferred_backup_window:         "23:03-23:33" => "<computed>"
  preferred_maintenance_window:    "mon:04:40-mon:05:10" => "<computed>"
  reader_endpoint:                 "aurora-cluster-demo.cluster-ro-czvdbs8joxz1.eu-west-1.rds.amazonaws.com" => "<computed>"
  skip_final_snapshot:             "false" => "true"
  storage_encrypted:               "false" => "false"
  vpc_security_group_ids.#:        "1" => "<computed>"

kojiromike commented 6 years ago

@loivis that actually seems to capture the problem. Even though the .tf code shows skip_final_snapshot = false, attempts to run (plan/apply) show that something inside terraform wants to change it to true despite the user's specification.

caiges commented 6 years ago

I just took a quick look at the code and the only place I could find where that variable was being set to true was here.

csdev commented 6 years ago

I have also encountered this bug with RDS read replicas, for which final snapshots do not even apply.

vlad2 commented 6 years ago

This bug still exists (trying to destroy a Postgres RDS instance, Terraform v0.11.7).

It seems to manifest if 'skip_final_snapshot' parameter was missing when the instance got created, and it's added later. If the 'skip_final_snapshot' was specified from the beginning, everything works correctly.

rtizzy commented 6 years ago

Alright, so on a normal RDS instance I've found a way to easily replicate this bug and a proposed fix.

Replication:

Spin up an RDS instance with Terraform (with skip_final_snapshot = false and no name set for snapshot)
Either set or change the identifier on the database.
Attempt to apply.
It will complain that it cannot do this since the instance requires a final snapshot and there is no name set.
Change the identifier back to the original value.
Set skip_final_snapshot = true.
Apply. This will set that variable in the state file.
Change the identifier.
Apply. It will now properly destroy and recreate the instance.

Suggested fix:

If possible, it seems like a good idea to check if skip_final_snapshot = true via the config and NOT just the state. This would prevent the song and dance seen above.

Right now, it is going soley by the state so if you end up in a state where skip_final_snapshot is already false, then it's only possible to fix this by ensuring that is set to true (Without modifying the state file manually, which is bad practice)

yruss972 commented 6 years ago

Hit this also, please fix :)

rupsray commented 6 years ago

Even though I have mentioned skip_final_snapshot = "true", I am getting the below error:

RDS Cluster FinalSnapshotIdentifier is required when a final snapshot is required

Terraform v0.11.3

provider.aws v1.13.0

rtizzy commented 6 years ago

@rupsray See my comment here

https://github.com/terraform-providers/terraform-provider-aws/issues/2588#issuecomment-396299734

This should allow you to workaround the issue.

If you examine your state file, you'll likely notice that it's not set to true and that it will attempt to destroy it before first changing skip_final_snapshot = "true" in the state file.

krestivo-kdinfotech commented 5 years ago

This appears to still be broken over a year later.

rpatrick00 commented 5 years ago

Not only is it still broken, there is another issue with the using skip_final_snapshot = false.

If I set the final_snapshot_identifier to a constant name (e.g., mydb-final-snapshot), terraform destroy fails if the snapshot with that name already exists. While I understand why, there doesn't seem to be a good way to get around this limitation other than to manually delete the previous final snapshot or change the name of the final snapshot every time you run terraform destroy. Sigh...

bostrowski13 commented 5 years ago

I hate to say this, but I keep running issues that have been for for 2, 3, 4, or 5 years that aren't fixed.

They really need to do something about prioritizing issues based on how long they've been open.

krestivo-kdinfotech commented 5 years ago

Is it they or us? Do they accept patches? Anyone willing to take these on?

camservo commented 5 years ago

Being as this only affects RDS instances that were created without final snapshot policy set to enforced, I don't know if this is a bug so much as a good place to put some notifications that you can't change this option with TF. I would think if someone wanted it protected from the start, they wouldn't want some accidental change to some TF configurations to make it unprotected.

caiges commented 5 years ago

I just tried to reproduce the original issue with the latest AWS provider and couldn't. @rpatrick00 I'm going to take a look at your issue tonight.

caiges commented 5 years ago

Okay, I can reproduce and see what's going on here. For starters, skip_final_snapshot defaults to False which should also require final_snapshot_identifier to be set but it's not so what happens is the create/update is applied, state updated where skip_final_snapshot is False but final_snapshot_identifier is null. This causes the destroy operation to fail it's verification stage.

This can be fixed but I don't really have a great story for those who already have prexisting state. One possibility would be that a delete operation ignores skip_final_shopshot if the identifier is null. Another might be to default final_snapshot_identifier to something random if skip_final_snapshot is set to or defaulted to False. I think for data safety reasons, ignoring skip_final_snapshot if final_snapshot_identifier is null is a bad idea and it'd be better to just randomize an identifier.

krestivo-kdinfotech commented 5 years ago

Thanks very much for taking the time to look into this.

A randomized ID seems safest to me, if it is a UUID or base32'ed UUID or something guaranteed to be random.

kris1610 commented 4 years ago

is this issue resolved or not ? I am using Terraform v0.12.23 and facing this issue. Tried with some workaround but its not working

caiges commented 4 years ago

I opened a PR last year and it looks like there's now conflicting changes. I'll update the PR but I can only be hopeful this gets more notice.

valeriozhang commented 4 years ago

still an issue? lol

reiz commented 4 years ago

I'm using:

Terraform v0.12.24
+ provider.aws v2.57.0

And this is still an issue! Any plans to fix this bug?

dinvlad commented 4 years ago

Same here:

Terraform v0.12.24
+ provider.aws v2.58.0

alex-harvey-z3q commented 4 years ago

I believe I have reproduced the same issue using Terraform 0.13-beta2:

Terraform v0.13.0-beta2
+ provider registry.terraform.io/hashicorp/aws v2.67.0

After adding skip_final_snapshot = true, I still had skip_final_snapshot = false in the state file. I edited that to be also true, and I could delete the DB instance.

tracycarley commented 4 years ago

I ran into the same issue here with Terraform 0.12.26. Any solutions for this?

rotemjac commented 4 years ago

In my case I encountered this issue when trying to destroy my RDS resources. Below are the steps I took for solving this issue:

Change skip_final_snapshot to true and remove final_snapshot_identifier if exists.
Remove backup_window (Under AWS Aurora its probably called preferred_backup_window).
Change backup_retention_period to 0.
Make sure that apply_immediately is set to true.

(*) A more detailed answer is given here.

(**) Running under: Terraform v0.12.28, provider.aws v2.70.0

ameyaagashe commented 4 years ago

Terraform Version 0.13.5 and still RDS instance deletion is not as clean and graceful as it should be.

jan-swiecki commented 3 years ago

Still a problem with terraform v0.14.4. Can anyone point me in the right direction, I can try to fix this.

caiges commented 3 years ago

I got around to updating this PR with the latest API and upstream changes.

thiagolsfortunato commented 3 years ago

Any news?

I've reproduced the same error using Terraform 0.14:

Terraform v0.14.0
+ provider registry.terraform.io/hashicorp/aws v3.44.0

Error:

Error: RDS Cluster FinalSnapshotIdentifier is required when a final snapshot is required

everdrone commented 2 years ago

Experiencing this problem with

Terraform v1.1.5
+ provider registry.terraform.io/hashicorp/aws v4.0.0

mrocheleau commented 2 years ago

FWIW on Terraform v1.1.6 I encountered this bug as I had previously NOT set the "skip_final_snapshot" to anything, so the state file had it as "False". Updating my code to set this to "true" failed as it's looking at the state file first.

So I updated my state file manually (terraform state pull > somefile.tfstate) then edited that switch to "true" from false, incremented the "serial" by one and terraform state push somefile.tfstate to get it back up.

Now I can apply properly without that final snapshot trying to happen and failing.

promenadeviki commented 1 year ago

FWIW on Terraform v1.1.6 I encountered this bug as I had previously NOT set the "skip_final_snapshot" to anything, so the state file had it as "False". Updating my code to set this to "true" failed as it's looking at the state file first.

So I updated my state file manually (terraform state pull > somefile.tfstate) then edited that switch to "true" from false, incremented the "serial" by one and terraform state push somefile.tfstate to get it back up.

Now I can apply properly without that final snapshot trying to happen and failing.

For me, I was able to add skip_final_snapshot, re-apply, and then I was able to destroy without issues.

The issue is indeed the state is holding the default value of false, because there was not one specified.

CErikLjung commented 1 year ago

Excuse my ignorance, but why exactly do we need skip_final_snapshot?

Shouldn't it be possible to deduce the expected value of skip_final_snapshot from final_snapshot_identifier?

Something along the lines of: skip_final_snapshot = is_empty(final_snapshot_identifier)

johnsonaj commented 1 year ago

Hi all, just providing a quick update on this issue. To solve this on the resource level we decided to utilize the Terraform Plugin Framework, which has better support for warnings during the destroy step. During the upgrade, we discovered roadblocks which could potentially cause unexpected diffs, ultimately disrupting how the currently deployed resources work.

This upgrade has been paused, for now, until we have a better understanding of the impact that these changes would have on existing resources.

bo-yoon commented 1 year ago

Has this issue been resolved? It still happens to me.

terraform -v Terraform v1.5.4

sjackson0109 commented 7 months ago

This is still an issue.

│ Error: RDS Cluster final_snapshot_identifier is required when skip_final_snapshot is false

Code:

# RDS Cluster
resource "aws_rds_cluster" "rds" {
  for_each                         = try(var.rds_cluster, {})
  allow_major_version_upgrade      = each.value.allow_major_version_upgrade
  apply_immediately                = each.value.apply_immediately
  backup_retention_period          = each.value.backup_retention_period
  cluster_identifier               = each.value.name
  copy_tags_to_snapshot            = each.value.copy_tags_to_snapshot
  database_name                    = each.value.database_name
  db_cluster_parameter_group_name  = aws_rds_cluster_parameter_group.rds[each.value.cluster_parameter_group_key].name
  db_instance_parameter_group_name = aws_db_parameter_group.rds[each.value.db_parameter_group_key].name
  db_subnet_group_name             = aws_db_subnet_group.rds[each.value.db_subnet_group_key].name
  deletion_protection              = try(each.value.deletion_protection, false)
  engine                           = try(each.value.engine, "aurora-mysql")
  engine_mode                      = try(each.value.engine_mode,"serverless")
  engine_version                   = try(each.value.engine_version, "aurora-mysql8.0")
  storage_encrypted                = try(each.value.storage_encrypted, false)
  preferred_backup_window          = try(each.value.preferred_backup_window,"01:00-03:00")
  preferred_maintenance_window     = try(each.value.preferred_maintenance_window,"Sat:01:00-Sat:03:00")
  master_username                  = random_password.generated[each.value.password_key].id
  master_password                  = random_password.generated[each.value.password_key].result
  skip_final_snapshot              = try(each.value.skip_final_snapshot, true)  # DEFAULTS TO TRUE
  vpc_security_group_ids           = [aws_security_group.rds.id]

  }
}

Variable:

rds_cluster = {
      0 = {
        name = "mycluster"
        database_name = "product_name"
        backup_retention_period = 31
        apply_immediately = true
        allow_major_version_upgrade = true
        copy_tags_to_snapshot = true
        deletion_protection = false
        engine = "aurora-mysql" 
        engine_mode = "serverless" #serverless or provisioned
        engine_version = "8.0"
        instance_class = "db.serverless"
        cluster_parameter_group_key = 0
        db_parameter_group_key = 0
        db_subnet_group_key = 0
        username_key = 0
        password_key = 0
        storage_encrypted = true
        skip_final_snapshot = true   # CLEARLY SET TO TRUE
        monitoring_interval = 60

        preferred_backup_window = "23:00-00:00"
        preferred_maintenance_window = "Thu:01:00-Thu:03:00"
      }
    }

Versions:

Terraform v1.6.6
on windows_amd64
+ provider registry.terraform.io/hashicorp/aws v4.67.0

sjackson0109 commented 7 months ago

I even get the same error with these attribute combinations set:

  skip_final_snapshot                 = null
  final_snapshot_identifier           = "DELETE-ME"

or

  skip_final_snapshot                 = false
  final_snapshot_identifier           = "DELETE-ME"

or

  skip_final_snapshot                 = true
  final_snapshot_identifier           = "DELETE-ME"

Is there another attribute that forces skip_final_snapshot to be false?

drewlearns commented 6 months ago

This is still not resolved with aws provider 5.31.0 and tf version 1.6.5

drewlearns commented 6 months ago

Was able to delete my cluster and redeploy with skip_final_snapshot = true and that appears to have stuck. Weird and frustrating for sure.

susmitha48 commented 4 months ago

We got the same issue

JuanMongeLL commented 1 month ago

Still happening in 2024. 7 years now.

hamza-louis commented 1 week ago

Still happening in Oct 2024. 7 years now. Why ?!

hashicorp / terraform-provider-aws