terraform-aws-modules / terraform-aws-rds-aurora

Terraform module to create AWS RDS Aurora resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/rds-aurora/aws
Apache License 2.0
384 stars 565 forks source link

Terraform wants to recreate cluster on every apply #8

Closed asaghri closed 4 years ago

asaghri commented 5 years ago

Hello,

Thanks for the great module. However with the avaibility zones variables being set, terraform wants to recreate the cluster on every apply as you can see on this issue (https://github.com/hashicorp/terraform/issues/16724).

I guess a workaround would be to drop the availability_zones variables in the cluster.

Tx !

max-rocket-internet commented 5 years ago

Hey @asaghri Interesting. Could you paste your code?

I am using this without any problem:

availability_zones              = ["${data.aws_availability_zones.available.names}"]
asaghri commented 5 years ago

Hey @max-rocket-internet,

That was super fast ! Thanks for the advice I'll try that right away.

Before I had this : availability_zones = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]

max-rocket-internet commented 5 years ago

availability_zones = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]

That's functionally the same as my example. So no worries there.

If you could paste your code and the output from Terraform when it wants to destroy and recreate the cluster, that would be good πŸ™‚

asaghri commented 5 years ago

Ok so here is the code

module "aurora" {
  source                          = "github.com/terraform-aws-modules/terraform-aws-rds-aurora"
  name                            = "${local.name_prefix}-db"
  engine                          = "aurora-postgresql"
  engine_version                  = "10.4"
  subnets                         = ["${module.vpc.database_subnets}"]
  availability_zones              = ["${data.aws_availability_zones.available.names}"]
  vpc_id                          = "${module.vpc.vpc_id}"
  replica_count                   = "${var.aurora_replica_count}"
  username                        = "${var.aurora_master_username}"
  password                        = "${var.aurora_master_password}"
  instance_type                   = "${var.aurora_instance_type}"
  snapshot_identifier             = "${var.snapshot_identifier}"
  apply_immediately               = true
  skip_final_snapshot             = true
  db_parameter_group_name         = "${aws_db_parameter_group.aurora_db_postgres10_parameter_group.id}"
  db_cluster_parameter_group_name = "${aws_rds_cluster_parameter_group.aurora_cluster_postgres10_parameter_group.id}"
}

And the output

-/+ module.cleo.module.aurora.aws_rds_cluster.this (new resource required)
      id:                               "staging-cleo-bfmtv-db" => <computed> (forces new resource)
      apply_immediately:                "true" => "true"
      arn:                              "arn:aws:rds:eu-west-1:834179885026:cluster:staging-cleo-bfmtv-db" => <computed>
      availability_zones.#:             "3" => "3"
      availability_zones.1924028850:    "eu-west-1b" => "eu-west-1b"
      availability_zones.3953592328:    "eu-west-1a" => "eu-west-1a"
      availability_zones.94988580:      "eu-west-1c" => "eu-west-1c"
      backup_retention_period:          "7" => "7"
      cluster_identifier:               "staging-cleo-bfmtv-db" => "staging-cleo-bfmtv-db"
      cluster_identifier_prefix:        "" => <computed>
      cluster_members.#:                "1" => <computed>
      cluster_resource_id:              "cluster-SBQNHFRE2DT7SUUDDBPAEMFZQY" => <computed>
      database_name:                    "" => <computed>
      db_cluster_parameter_group_name:  "staging-cleo-bfmtv-aurora-postgres10-cluster-parameter-group" => "staging-cleo-bfmtv-aurora-postgres10-cluster-parameter-group"
      db_subnet_group_name:             "staging-cleo-bfmtv-db" => "staging-cleo-bfmtv-db"
      endpoint:                         "staging-cleo-bfmtv-db.cluster-cw3emfe46duo.eu-west-1.rds.amazonaws.com" => <computed>
      engine:                           "aurora-postgresql" => "aurora-postgresql"
      engine_mode:                      "provisioned" => "provisioned"
      engine_version:                   "10.4" => "10.4"
      final_snapshot_identifier:        "final-staging-cleo-bfmtv-db-557bdeaf" => "final-staging-cleo-bfmtv-db-557bdeaf"
      hosted_zone_id:                   "Z29XKXDKYMONMX" => <computed>
      kms_key_id:                       "" => <computed>
      master_password:                  <sensitive> => <sensitive> (attribute changed)
      master_username:                  "cleorecette" => "cleorecette"
      port:                             "5432" => "5432"
      preferred_backup_window:          "02:00-03:00" => "02:00-03:00"
      preferred_maintenance_window:     "sun:05:00-sun:06:00" => "sun:05:00-sun:06:00"
      reader_endpoint:                  "staging-cleo-bfmtv-db.cluster-ro-cw3emfe46duo.eu-west-1.rds.amazonaws.com" => <computed>
      skip_final_snapshot:              "true" => "true"
      snapshot_identifier:              "cleo-aurora" => "cleo-aurora"
      storage_encrypted:                "false" => "true" (forces new resource)
      vpc_security_group_ids.#:         "1" => "1"
      vpc_security_group_ids.560316842: "sg-002c06942ec79f3f6" => "sg-002c06942ec79f3f6"

-/+ module.cleo.module.aurora.aws_rds_cluster_instance.this (new resource required)
      id:                               "staging-cleo-bfmtv-db-1" => <computed> (forces new resource)
      apply_immediately:                "true" => "true"
      arn:                              "arn:aws:rds:eu-west-1:834179885026:db:staging-cleo-bfmtv-db-1" => <computed>
      auto_minor_version_upgrade:       "true" => "true"
      availability_zone:                "eu-west-1c" => <computed>
      cluster_identifier:               "staging-cleo-bfmtv-db" => "${aws_rds_cluster.this.id}" (forces new resource)
      db_parameter_group_name:          "staging-cleo-bfmtv-aurora-db-postgres10-parameter-group" => "staging-cleo-bfmtv-aurora-db-postgres10-parameter-group"
      db_subnet_group_name:             "staging-cleo-bfmtv-db" => "staging-cleo-bfmtv-db"
      dbi_resource_id:                  "db-RVT2U4M3CN7DQTQIMDOK6QQVVU" => <computed>
      endpoint:                         "staging-cleo-bfmtv-db-1.cw3emfe46duo.eu-west-1.rds.amazonaws.com" => <computed>
      engine:                           "aurora-postgresql" => "aurora-postgresql"
      engine_version:                   "10.4" => "10.4"
      identifier:                       "staging-cleo-bfmtv-db-1" => "staging-cleo-bfmtv-db-1"
      identifier_prefix:                "" => <computed>
      instance_class:                   "db.r4.large" => "db.r4.large"
      kms_key_id:                       "" => <computed>
      monitoring_interval:              "0" => "0"
      monitoring_role_arn:              "" => <computed>
      performance_insights_enabled:     "false" => "false"
      performance_insights_kms_key_id:  "" => <computed>
      port:                             "5432" => <computed>
      preferred_backup_window:          "02:00-03:00" => <computed>
      preferred_maintenance_window:     "sun:05:00-sun:06:00" => "sun:05:00-sun:06:00"
      promotion_tier:                   "1" => "1"
      publicly_accessible:              "false" => "false"
      storage_encrypted:                "false" => <computed>
      writer:                           "true" => <computed>

Plan: 2 to add, 0 to change, 2 to destroy.
asaghri commented 5 years ago

Ok sorry the problem came from the storage encryption. The snapshot I used wasn't encrypted, so every time I applied it tried to change the storage type but it seems like it's not possible to change the storage encryption.

It recreates the cluster with the storage encryption set to true, but actually it doesn't change it.

max-rocket-internet commented 5 years ago

Ah I see! Well mystery solved then.

gannino commented 5 years ago

Hi Everyone, I'm new to terraform (only 2 months) and I've been hit from this odd behavior too to be honest my belief is that the Aurora cluster try to use 3AZ when you specify the value availability_zone, even if you set your AZ to be only a and b when terraform refresh the status it find a 3rd zone that causes the destroy and recreate behaviour, as from info per the previous issue this disappear once you comment out availability_zone and specify only the db_subnet_group_name for your cluster.

hope this help, but an article explaining this behaviour might save some day of research for someone else.

hope this helps to find the root cause, thanks everyone! Giovanni

nergdron commented 5 years ago

I'm seeing this right now with the latest code, using only the subnets option, no azs or anything else specified. for me, it can't seem to correctly read the AZs and id state for for the existing cluster, so it always thinks those have changed:

      id:                                "my-db-id" => <computed> (forces new resource)
      availability_zones.#:              "3" => "0" (forces new resource)
      availability_zones.3551460226:     "us-east-1e" => "" (forces new resource)
      availability_zones.3569565595:     "us-east-1a" => "" (forces new resource)
      availability_zones.986537655:      "us-east-1c" => "" (forces new resource)
asaghri commented 5 years ago

Could you share the code ? I use data.aws_availability_zones.available.names to indicate the az and it works fine. Do you use a snapshot id ?

nergdron commented 5 years ago

yes, I'm not specifying the AZs, all I'm supplying is the subnets, and it's computing the AZs from that. I'm not using a snapshot at all, this is a fresh install.

nergdron commented 5 years ago
module "db" {
  source = "../../modules/aws-rds-aurora"

  name                    = "something-${var.aws_env}"
  identifier_prefix       = "something-${var.aws_env}"
  vpc_id                  = "${data.aws_vpc.info.id}"
  subnets                 = "${var.aws_private_subnet_ids}"
  allowed_security_groups = ["${aws_security_group.mysg.id}"]

  engine                          = "aurora-postgresql"
  engine_version                  = "10.4"
  storage_encrypted               = "true"
  preferred_maintenance_window    = "Sun:03:00-Sun:03:30"
  preferred_backup_window         = "04:00-04:30"
  replica_count                   = 1
  instance_type                   = "db.${var.aws_instance_type}"
  skip_final_snapshot             = true                          # not useful on initial create
  db_parameter_group_name         = "default.aurora-postgresql10"
  db_cluster_parameter_group_name = "default.aurora-postgresql10"

  username = "something"
  password = "somethingelse" # default, must be changed after setup

  tags = {
    Name        = "something-${var.aws_env}"
    environment = "${var.aws_env}"
    terraform   = "true"
  }
}
asaghri commented 5 years ago

ok then you should try to add this to use all the zones available and see if it works :

availability_zones = "${data.aws_availability_zones.available.names}"

nergdron commented 5 years ago

looks like if I do that it'll want to kill and recreate my instance too, since we're using specific AZs intentionally, given that us-east-1 is a bit of a mess and not all instance types and configurations are available in all AZs.

-/+ module.db.aws_rds_cluster.this (new resource required)
      id:                                "chatapi-metrics-dev" => <computed> (forces new resource)
      apply_immediately:                 "false" => "false"
      arn:                               "arn:aws:rds:us-east-1:759579518471:cluster:chatapi-metrics-dev" => <computed>
      availability_zones.#:              "3" => "6" (forces new resource)
      availability_zones.1252502072:     "" => "us-east-1f" (forces new resource)
      availability_zones.1305112097:     "" => "us-east-1b" (forces new resource)
      availability_zones.2762590996:     "" => "us-east-1d" (forces new resource)
      availability_zones.3551460226:     "us-east-1e" => "us-east-1e"
      availability_zones.3569565595:     "us-east-1a" => "us-east-1a"
      availability_zones.986537655:      "us-east-1c" => "us-east-1c"

strangely, it sees the existing AZs when I supply AZs as an arg, but not when I let it compute them. so I feel like this may be something wrong in the upstream terraform module. note that even if I fix the AZs problem it still wants to recreate it every time because of the id as well. so I'm not sure what the solution here is.

max-rocket-internet commented 5 years ago

I think the problem is already solved in this PR: https://github.com/terraform-aws-modules/terraform-aws-rds-aurora/pull/10.

i.e. just pass subnets to the module and don't use the availability_zones argument at all. It's not really clear from the documentation how these 2 arguments interact when they don't match.

given that us-east-1 is a bit of a mess and not all instance types and configurations are available in all AZs.

Are you sure?? I've never heard of different AZs being inconsistent in this way.

nergdron commented 5 years ago

Oh yeah, PR#10 does seem to be what I'm encountering. thanks!

as for the AZ issues, we've definitely run into this in the past, were certain instance types were available in certain AZs for extended periods of time. but only in us-east-1. us-west-2, for instance, never seems to have this issue. we've always put it down to it being the oldest and crustiest region, and AWS not exactly keeping things consistent much of anywhere in their codebases.

MarkAsbrey commented 5 years ago

Ok sorry the problem came from the storage encryption. The snapshot I used wasn't encrypted, so every time I applied it tried to change the storage type but it seems like it's not possible to change the storage encryption.

It recreates the cluster with the storage encryption set to true, but actually it doesn't change it.

Having what looks to be the same issue, did you manage to get round it without needing to destroy the cluster?

wayneworkman commented 5 years ago

I'm having the same problems with DocumentDB. If I try to pass a subnet group to a cluster, it wants to rebuild it every time. If I pass the availability zones, it wants to rebuild it every time. If I comment out those two lines, it doesn't rebuild it every time.

I have to be able to pass a list of subnets at the minimum. We only use two availibility zones of the available 3 in the region.

sanoop19 commented 4 years ago

My requirement was only two AZ with Aurora , Do not use DB Subnet group in RDS Instance , Not sure thats a logic , but this resolved my issue. ALso need to have lifestyle rule in place

apply_immediately = "true" lifecycle { ignore_changes = [ "availability_zones", ] } }

Before the issue

resource "aws_rds_cluster_instance" "test" { count = var.count1 instance_class = var.instance_type identifier = "${var.rds_identifier}-${count.index+01}" cluster_identifier = aws_rds_cluster.testmigcluster.id

db_subnet_group_name = aws_db_subnet_group.subnet_group.name

ca_cert_identifier = "rds-ca-2019" promotion_tier = "1" db_parameter_group_name = aws_db_parameter_group.rds-parameter.name engine = var.engine engine_version = var.engine_version

After solving ****

resource "aws_rds_cluster_instance" "test" { count = var.count1 instance_class = var.instance_type identifier = "${var.rds_identifier}-${count.index+01}" cluster_identifier = aws_rds_cluster.testmigcluster.id

db_subnet_group_name = aws_db_subnet_group.subnet_group.name

ca_cert_identifier = "rds-ca-2019" promotion_tier = "1" engine = var.engine engine_version = var.engine_version

Also In avaiability zone i mentioned eu-west1a, and eu-west-1b In Subnet group for RDS Cluster i mentioned EU-WEST-1a subnet and 1b subnet

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.