hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.88k stars 9.22k forks source link

[Bug]: RDS replica instance destorying/create every time apply even there's no changes to RDS #31325

Open yanhuiyi opened 1 year ago

yanhuiyi commented 1 year ago

Terraform Core Version

= 1.2.0

AWS Provider Version

~> 4.16

Affected Resource(s)

aws_db_instance

Expected Behavior

RDS replica instance shouldn't recreate every time.

Actual Behavior

terraform apply command output summary,

# aws_db_instance.main_replica must be replaced
-/+ resource "aws_db_instance" "main_replica" {
      ~ address                               = "xxx-replica-dev.clfntrkokco7.ap-northeast-2.rds.amazonaws.com" -> (known after apply)
      ~ allocated_storage                     = 80 -> (known after apply)
      ~ arn                                   = "arn:aws:rds:ap-northeast-2:092492597114:db:xxx-replica-dev" -> (known after apply)
      ~ availability_zone                     = "ap-northeast-2a" -> (known after apply)
      ~ backup_retention_period               = 0 -> (known after apply)
      ~ backup_window                         = "17:27-17:57" -> (known after apply)
      ~ ca_cert_identifier                    = "rds-ca-2019" -> (known after apply)
      + character_set_name                    = (known after apply)
      - customer_owned_ip_enabled             = false -> null
      ~ db_name                               = "xxx" -> (known after apply)
      ~ db_subnet_group_name                  = "terraform-20230510021049001800000001" -> (known after apply)
      - deletion_protection                   = false -> null
      - enabled_cloudwatch_logs_exports       = [] -> null
      ~ endpoint                              = "xxx-replica-dev.clfntrkokco7.ap-northeast-2.rds.amazonaws.com:5432" -> (known after apply)
      ~ engine                                = "postgres" -> (known after apply)
      ~ engine_version                        = "13.10" -> (known after apply)
      ~ engine_version_actual                 = "13.10" -> (known after apply)
      ~ hosted_zone_id                        = "ZLA2NUCOLGUUR" -> (known after apply)
      - iam_database_authentication_enabled   = false -> null
      ~ id                                    = "xxx-replica-dev" -> (known after apply)
      + identifier_prefix                     = (known after apply)
      ~ iops                                  = 3000 -> (known after apply)
      ~ kms_key_id                            = "arn:aws:kms:ap-northeast-2:092492597114:key/62dfb2b1-38b2-4ab8-93d3-c3caaf12daaf" -> (known after apply)
      + latest_restorable_time                = (known after apply)
      ~ license_model                         = "postgresql-license" -> (known after apply)
      ~ listener_endpoint                     = [] -> (known after apply)
      ~ maintenance_window                    = "thu:14:01-thu:14:31" -> (known after apply)
      ~ master_user_secret                    = [] -> (known after apply)
      + master_user_secret_kms_key_id         = (known after apply)
      - max_allocated_storage                 = 0 -> null
      + monitoring_role_arn                   = (known after apply)
      ~ multi_az                              = false -> (known after apply)
      ~ name                                  = "xxx" -> (known after apply)
      + nchar_character_set_name              = (known after apply)
      ~ network_type                          = "IPV4" -> (known after apply)
      ~ option_group_name                     = "default:postgres-13" -> (known after apply)
      + performance_insights_kms_key_id       = (known after apply)
      ~ performance_insights_retention_period = 0 -> (known after apply)
      ~ port                                  = 5432 -> (known after apply)
      + replica_mode                          = (known after apply)
      ~ replicas                              = [] -> (known after apply)
      ~ resource_id                           = "db-4RXU2QM4CEORASUHIPMX32IEP4" -> (known after apply)
      - security_group_names                  = [] -> null
      + snapshot_identifier                   = (known after apply)
      ~ status                                = "available" -> (known after apply)
      - storage_encrypted                     = true -> null # forces replacement
      ~ storage_throughput                    = 125 -> (known after apply)
      ~ storage_type                          = "gp3" -> (known after apply)
      - tags                                  = {} -> null
      + timezone                              = (known after apply)
      ~ username                              = "xxx" -> (known after apply)
        # (14 unchanged attributes hidden)
    }

Relevant Error/Panic Output Snippet

Part of output while executing,

aws_db_instance.main_replica: Still destroying... [id=xxx-replica-dev, 4m0s elapsed]
aws_db_instance.main_replica: Still destroying... [id=xxx-replica-dev, 4m10s elapsed]
aws_db_instance.main_replica: Still destroying... [id=xxx-replica-dev, 4m20s elapsed]
aws_db_instance.main_replica: Still destroying... [id=xxx-replica-dev, 4m30s elapsed]
aws_db_instance.main_replica: Destruction complete after 4m30s
aws_db_instance.main_replica: Creating...
aws_db_instance.main_replica: Still creating... [10s elapsed]
aws_db_instance.main_replica: Still creating... [20s elapsed]
aws_db_instance.main_replica: Still creating... [30s elapsed]
aws_db_instance.main_replica: Still creating... [40s elapsed]
aws_db_instance.main_replica: Still creating... [50s elapsed]
...
aws_db_instance.main_replica: Still creating... [12m11s elapsed]
aws_db_instance.main_replica: Creation complete after 12m14s [id=xxx-replica-dev]

Terraform Configuration Files

resource "aws_db_instance" "main" {
  db_name                 = var.db_name
  identifier              = join("-", [var.db_name, lower(var.environment)])
  allocated_storage       = var.db_storage_device.size     # gigabytes
  backup_retention_period = var.db_backup_retention_period # in days
  apply_immediately       = true
  db_subnet_group_name    = aws_db_subnet_group.main.name
  availability_zone       = aws_subnet.private1.availability_zone
  engine                  = "postgres"
  engine_version          = var.db_engine_version
  instance_class          = var.db_instance_class
  multi_az                = false
  parameter_group_name    = aws_db_parameter_group.main.name
  password                = local.db_creds.password
  port                    = 5432
  publicly_accessible     = true
  storage_encrypted       = true # you should always do this
  storage_type            = var.db_storage_device.type
  username                = local.db_creds.username
  vpc_security_group_ids  = [aws_security_group.allow-postgresql.id]
  skip_final_snapshot     = true
}

resource "aws_db_instance" "main_replica" {
  identifier             = join("-", [var.db_name, "replica", lower(var.environment)])
  replicate_source_db    = aws_db_instance.main.identifier
  instance_class         = var.db_instance_class
  apply_immediately      = true
  skip_final_snapshot    = true
  vpc_security_group_ids = [aws_security_group.allow-postgresql.id]
  parameter_group_name   = aws_db_parameter_group.main.name
}

Steps to Reproduce

  1. Change other any resources config other than RDS resources
  2. terraform apply

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

timothyclarke commented 1 year ago

Does the replica source have storage encrypted ? The times I've setup DB replication many of the properties came from the source database. It created fine, but the next time terraform was run it tries to 'correct' the config drift. In my cases 'correcting the config drift' was 'update the terraform config' rather than 'applying the .tf file'

In your case storage_encrypted = true looks to be what is causing the replacement. You can add a lifecycle rule to ignore that once the DB replica is created

yanhuiyi commented 1 year ago

Thank you @timothyclarke! Putting the property to ignore_changes working fine so far. lifecycle { ignore_changes = [storage_encrypted] }

garbelini commented 1 year ago

I just experienced something similar but it seems to have been originated by an AWS api error caused by overlapping backup and maintenance windows settings. In my case customer_owned_ip_enabled, tags and enabled_cloudwatch_logs_exports were triggering a resource replacement.

Why AWS doesn't validate this before proceeding with a very time consuming and expensive operation is beyond me.

Edit: Validating for overlap on those windows in the provider would be nice!