hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.75k stars 9.1k forks source link

[Bug]: aws_elasticache_replication_group redis AZ list must match node number on create but not update #38493

Open jeremy-balmos opened 1 month ago

jeremy-balmos commented 1 month ago

Terraform Core Version

1.7.5

AWS Provider Version

5.44.0

Affected Resource(s)

aws_elasticache_replication_group

Expected Behavior

When setting availability_zones to a different number than number_cache_clusters, all nodes should be created in those availability zones.

Actual Behavior

However, if you create initially with a matching number and then go update the number of nodes, terraform does not throw the error and you are able to create additional nodes without providing additional AZs. Same is true for removing nodes.

An error is thrown: Error: creating ElastiCache Replication Group (staging-elasticache-replication-group): InvalidParameterValue: When specifying preferred availability zones, the number of cache clusters must be specified and must match the number of preferred availability zones.

Relevant Error/Panic Output Snippet

│ Error: creating ElastiCache Replication Group (staging-elasticache-replication-group): InvalidParameterValue: When specifying preferred availability zones, the number of cache clusters must be specified and must match the number of preferred availability zones.
│   status code: 400, request id: 94795e92-28e5-47a9-ac6b-d8c6d00a22af
│ 
│   with aws_elasticache_replication_group.multi_az,
│   on elasticache.tf line 18, in resource "aws_elasticache_replication_group" "multi_az":
│   18: resource "aws_elasticache_replication_group" "multi_az" {
│

Terraform Configuration Files

resource "aws_elasticache_replication_group" "multi_az" {
  multi_az_enabled           = true
  automatic_failover_enabled = true
  # This seems to be directly tied to node count
  preferred_cache_cluster_azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
  replication_group_id        = "${local.environment}-elasticache-replication-group"
  description                 = "${local.environment} Replication Group"
  node_type                   = "cache.r5.large"
  engine                      = "redis"
  engine_version              = "7.1"
  subnet_group_name           = aws_elasticache_subnet_group.main.name
  security_group_ids          = data.aws_security_groups.elasticache.ids
  num_cache_clusters          = 4
  parameter_group_name        = "default.redis7"
  port                        = 6379
  at_rest_encryption_enabled  = true
  transit_encryption_enabled  = true
}

Steps to Reproduce

  1. Create a config for a replication group
  2. Provide redis as the engine type
  3. Set the list of AZs to a diff number than number_cache_clusters
  4. terraform apply
  5. Change the configs so the number of cache clusters and azs match
  6. terraform apply
  7. Modify the value and add another cache cluster
  8. terraform apply

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

https://github.com/hashicorp/terraform-provider-aws/issues/207 - First observed with memcached https://github.com/hashicorp/terraform-provider-aws/issues/14070 - Believed to have been fixed in 4.0.0

Would you like to implement a fix?

None

github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 month ago

Hey @jeremy-balmos 👋 Thank you for taking the time to raise this! Out of curiosity, in your testing, did the starting/ending number of AZs / cache nodes have any impact? I've seen some resources where AWS automatically adds additional AZs if less than 3 are specified, so I'm curious if that's at play here at all.

Additionally, are you able to supply debug logs (redacted as needed)? That may help whoever ultimately picks this up to look further into it.