aws_elasticache_replication_group unable to set `automatic_failover_enabled` to false

camlow325 commented 3 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

› terraform -v
Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/aws v3.36.0

Affected Resource(s)

aws_elasticache_replication_group

Terraform Configuration Files

Initial config:

resource "aws_elasticache_replication_group" "this" {
  replication_group_id          = "Test"
  replication_group_description = "Test Redis"

  engine                     = "redis"
  node_type                  = "cache.t3.micro"
  number_cache_clusters      = 2
  parameter_group_name       = "default.redis5.0"
  engine_version             = "5.0.3"
  port                       = 6379
  automatic_failover_enabled = true
}

Config for second apply, attempting to change number_cache_clusters from 2 to 1 and automatic_failover_enabled from true to false:

resource "aws_elasticache_replication_group" "this" {
  replication_group_id          = "Test"
  replication_group_description = "Test Redis"

  engine                     = "redis"
  node_type                  = "cache.t3.micro"
  number_cache_clusters      = 1
  parameter_group_name       = "default.redis5.0"
  engine_version             = "5.0.3"
  port                       = 6379
  automatic_failover_enabled = false
}

Debug Output

Panic Output

Expected Behavior

Apply should be successful, with the replication group now having 1 cache cluster and automatic_failover_enabled being false.

Actual Behavior

Terraform plan appears to show that the apply would change both attributes:

  # aws_elasticache_replication_group.this will be updated in-place
  ~ resource "aws_elasticache_replication_group" "this" {
      ...
      ~ automatic_failover_enabled    = true -> false
      ...
      ~ number_cache_clusters         = 2 -> 1
  ...

Plan: 0 to add, 1 to change, 0 to destroy.

Terraform apply fails, however, with the following error:

...
aws_elasticache_replication_group.this: Modifying... [id=test]

Error: error modifying ElastiCache Replication Group (test) clusters: error removing ElastiCache Replication Group (test) replicas: InvalidParameterValue: Must have at least 1 replica when cluster mode is disabled with auto failover enabled.
...

To try to apply these changes individually instead, change the number_cache_clusters attribute back to 2, leaving the automatic_failover_enabled attribute set to false.

Terraform plan appears to show that the apply would change both attributes:

  # aws_elasticache_replication_group.this will be updated in-place
  ~ resource "aws_elasticache_replication_group" "this" {
      ...
      ~ automatic_failover_enabled    = true -> false
      ...
      number_cache_clusters         = 2
  ...

Plan: 0 to add, 1 to change, 0 to destroy.

Terraform apply appears to be successful:

...
aws_elasticache_replication_group.this: Modifications complete after 33s [id=test]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

The AWS Console, however, shows that the failover setting has not changed:

Auto-failover: enabled

A repeated plan also shows that the current value for automatic_failover_enabled is still true (unchanged from the previous apply).

From the AWS Console, I am able to change Auto-failover from enabled to disabled. I then tried to apply the "config for second apply" from above again. The plan shows that both attributes would change:

  # aws_elasticache_replication_group.this will be updated in-place
  ~ resource "aws_elasticache_replication_group" "this" {
      ...
      ~ automatic_failover_enabled    = true -> false
      ...
      ~ number_cache_clusters         = 2 -> 1
  ...

Plan: 0 to add, 1 to change, 0 to destroy.

The subsequent apply is successful.

...
aws_elasticache_replication_group.this: Modifications complete after 5m17s [id=test]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

In this case, the AWS console shows that Number of nodes has been reduced from 2 to 1. Ultimately, then, Terraform can be used to reduce the number of cache clusters from 2 to 1, but this involves first disabling automatic failover outside of Terraform.

Steps to Reproduce

Apply the "initial config" above.
Apply the "config for second apply" above.

Important Factoids

References

srinivasmanthena commented 3 years ago

Hello, any estimated time line for fixing this bug? I am trying to provision multi region elasticache cluster and currently blocked by this. Any work arounds available? Thank you

chris-peterson commented 3 years ago

Also running into this. Enabling TF_LOG_PROVIDER=TRACE, can see details about the failing API request

2021-10-11T16:50:13.435Z [INFO]  plugin.terraform-provider-aws_v3.62.0_x5: 2021/10/11 16:50:13 [DEBUG] [aws-sdk-go] DEBUG: Response elasticache/DecreaseReplicaCount Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 400 Bad Request
Connection: close
Content-Length: 346
Content-Type: text/xml
Date: Mon, 11 Oct 2021 16:50:13 GMT
X-Amzn-Requestid: 1cd55712-c258-4aec-bf68-e4cdf8fffcd8
-----------------------------------------------------: timestamp=2021-10-11T16:50:13.435Z
2021-10-11T16:50:13.435Z [INFO]  plugin.terraform-provider-aws_v3.62.0_x5: 2021/10/11 16:50:13 [DEBUG] [aws-sdk-go] <ErrorResponse xmlns="http://elasticache.amazonaws.com/doc/2015-02-02/">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidParameterValue</Code>
    <Message>Must have at least 1 replica when cluster mode is disabled with auto failover enabled.</Message>
  </Error>
  <RequestId>1cd55712-c258-4aec-bf68-e4cdf8fffcd8</RequestId>
</ErrorResponse>: timestamp=2021-10-11T16:50:13.435Z

Based on @camlow325's observations, my hunch is that if the order of operations between elasticache/DecreaseReplicaCount and elasticache/ModifyReplicationGroup need to be swapped.

I believe the change would be localized to this method

magenx commented 2 years ago

when I tested the upgrade scenario, 1 -> 2 multi az is enabled but failover is greyed out in the console and not enabled.

richardgavel-ordinaryexperts commented 1 year ago

@magenx I've seen this behavior as well in the console. If both are false, then you have to select failover first then multi az. Surprised that it seems to be a single modification operation with respect to the console, but the same action from the API doesn't create the same result.

ryanpcmcquen commented 1 year ago

I'm hitting this bug in the opposite way (trying to set automatic_failover_enabled to true).

aristideubertas commented 3 months ago

no updates on this?

hashicorp / terraform-provider-aws