Cannot destroy an Aurora RDS cluster when it was built with a `replication_source_identifier` value

silviabotros commented 5 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.11.10
+ provider.aws v1.50.0

Affected Resource(s)

*provider.aws v1.50.0

Terraform Configuration Files

module "rds-cluster-vpc-1" {
  source = "../../modules/rds_cluster"
  name = "${var.db_name}-rds-cluster-${var.user}"
  user = "${var.user}"
  availability_zones = ["${data.aws_availability_zones.vpc-1-azs.names[0]}",
                        "${data.aws_availability_zones.vpc-1-azs.names[1]}",
                        "${data.aws_availability_zones.vpc-1-azs.names[2]}"
                       ]
  rds_final_snapshot_id   = "${var.db_name}-final-snapshot-${var.user}"
  skip_final_rds_snapshot = true
  vpc_id                  = "${module.vpc-1.vpc_id}"
  aws_subnet_ids          = ["${module.vpc-1.database_subnets}"]
  rds_access_sg              = ["${module.vpc-1-jump.security_group_id}"]
  providers = {
    "aws" = "aws.us-east-1"
  }
  db_name = "${var.db_name}"
  rds_admin_user = "${var.rds_admin_user}"
  rds_admin_password = "${var.rds_admin_password}"
  port = "${var.port}"
  tags = "${local.tags}"
  sox_compliant = "${var.sox_compliant}"
}

module "rds-cluster-vpc-2" {

  source = "../../modules/rds_cluster"
  name = "${var.db_name}-rds-cluster-${var.user}"
  user = "${var.user}"
  availability_zones = ["${data.aws_availability_zones.vpc-2-azs.names[0]}",
                        "${data.aws_availability_zones.vpc-2-azs.names[1]}",
                        "${data.aws_availability_zones.vpc-2-azs.names[2]}"
                       ]
  replication_source_identifier = "${module.rds-cluster-vpc-1.rds_cluster_arn}"
  rds_final_snapshot_id   = "${var.db_name}-final-snapshot-${var.user}"
  skip_final_rds_snapshot = true
  vpc_id                  = "${module.vpc-2.vpc_id}"
  aws_subnet_ids          = ["${module.vpc-2.database_subnets}"]
  rds_access_sg              = ["${module.vpc-2-jump.security_group_id}"]
  providers = {
    "aws" = "aws.us-west-2"
  }
  db_name = "${var.db_name}"
  rds_admin_user = "${var.rds_admin_user}"
  rds_admin_password = "${var.rds_admin_password}"
  port = "${var.port}"
  tags = "${local.tags}"
  sox_compliant = "${var.sox_compliant}"
}

Expected Behavior

Running terraofrm destroy should destroy everything including both RDS clusters and their VPCs

Actual Behavior

Destroy works on the primary cluster but fails on the secondary cluster

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

module.rds-cluster-vpc-2.aws_rds_cluster_instance.rds_cluster_instance[2]: Destroying... (ID: rdstest-2-dev)
Releasing state lock. This may take a few moments...

Error: Error applying plan:

3 error(s) occurred:

* module.vpc-1-jump.output.public_ip: element: element() may not be used with an empty list in:

${var.create == false ? "" : element(aws_instance.jumpbox.*.public_ip,0)}
* module.vpc-2-jump.output.public_ip: element: element() may not be used with an empty list in:

${var.create == false ? "" : element(aws_instance.jumpbox.*.public_ip,0)}
* module.rds-cluster-vpc-2.aws_rds_cluster_instance.rds_cluster_instance[2] (destroy): 1 error(s) occurred:

* aws_rds_cluster_instance.rds_cluster_instance.2: InvalidDBClusterStateFault: Cannot delete the last instance of the read replica DB cluster. Promote the DB cluster to a standalone DB cluster in order to delete it.
        status code: 400, request id: 456e5bf2-656a-4e22-84d7-49565db5976c

Steps to Reproduce

terraform apply
terraform destroy

References

See https://github.com/terraform-providers/terraform-provider-aws/issues/6672 for a related issue regarding trying to terraform cross region aurora replica clusters.

silviabotros commented 5 years ago

The output of a terraform destroy clearly shows that it is destroying the primary cluster and its VPC first where it should start with the secondary VPC and cluster to avoid this orphaned instance situation. Output is in this gist

silviabotros commented 5 years ago

in fact, even if I force a destroy order using phased targets, the secondary cluster still doesnt cleanly go away with the same error.

SaravanRaman commented 5 years ago

This is an expected behavior if am not wrong. This is a deliberate safeguard is put in place from aws in the upstream API to prevent accidental deletion and it would not seem appropriate for terraform to override this. https://aws.amazon.com/premiumsupport/knowledge-center/rds-error-delete-aurora-cluster/

silviabotros commented 5 years ago

If I am asking terraform to destroy, it should destroy. There is already guardrails in terraform that list things that will be destroyed and allowing for approval.

SaravanRaman commented 5 years ago

ah. This is at the Database level. if you promote the read replica to a standalone cluster, then your destroy should go through.

pbeaumontQc commented 5 years ago

But how do we promote the cluster to standalone?

resource "aws_rds_cluster" "replica" {
  cluster_identifier              = "db1-replica"
  database_name                   = "..."
  master_username                 = "..."
  master_password                 = "..."
  final_snapshot_identifier       = "..."
  skip_final_snapshot             = "true"
  backup_retention_period         = "7"
  preferred_backup_window         = "..."
  preferred_maintenance_window    = "..."
  port                            = "3306"
  vpc_security_group_ids          = ["..."]
  storage_encrypted               = "false"
  kms_key_id                      = ""
  apply_immediately               = "true"
  db_subnet_group_name            = "..."
  db_cluster_parameter_group_name = "..."
  replication_source_identifier   = "arn:aws:rds:..."
  engine                          = "aurora-mysql"
  engine_version                  = "5.7.mysql_aurora.2.04.5"
  source_region                   = "<cross region...>"

  lifecycle {
    prevent_destroy = false
  }
}

resource "aws_rds_cluster_instance" "replica" {
  count                        = "1"
  identifier                   = "db1-replica-0"
  cluster_identifier           = "${aws_rds_cluster.replica.id}"
  instance_class               = "db.t3.small"
  db_subnet_group_name         = "..."
  preferred_maintenance_window = "..."
  apply_immediately            = "true"
  db_parameter_group_name      = "..."
  auto_minor_version_upgrade   = "true"
  monitoring_interval          = "0"
  monitoring_role_arn          = ""
  engine                       = "aurora-mysql"
  engine_version               = "5.7.mysql_aurora.2.04.5"

  lifecycle {
    prevent_destroy = false
  }
}

This is the definition of our cluster, and if I blank the replication_source_identifier = "", it passes in the apply, but does nothing to the actual Read Replica, it stay as it is.

aws_rds_cluster.replica: Modifying... (ID: db1-replica) replication_source_identifier: "arn:aws:rds:..." => ""

roobert commented 5 years ago

@SaravanRaman - as @pbeaumontQc mentioned - is it possible to promote the cluster to standalone using terraform?

For anyone else coming across this, promotion can be done using aws(1): aws rds promote-read-replica-db-cluster --db-cluster-identifier <identifier>

grealish commented 4 years ago

Pinging this issue so it stays alive, unfortunately I landed on this while building a HA and Disaster Recovery setup; here's what we had to script for single instance DB clusters aws rds promote-read-replica --db-instance-identifier mysql-xxxx-ro --profile <profile>,or use IAM instance profile --region <region> takes a few minutes but everything stays up, you can still work on the DB and in a few minutes you are able to write to the DB with the same credentials CLI details: https://docs.aws.amazon.com/cli/latest/reference/rds/promote-read-replica.html

github-actions[bot] commented 2 years ago

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

techdragon commented 2 years ago

This is still an outstanding issue and should remain open. Its rather galling that this is has sat for over 2 years without any updates from Hashicorp. The integrity of customer data is obviously of paramount importance so any concerns Terraform users have with incorrect behaviour involving AWS RDS services is a major red flag and needs addressing.

hashicorp / terraform-provider-aws