hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.8k stars 9.15k forks source link

MSK EBS issue on enabling storage autoscaling #20327

Open devops-asService opened 3 years ago

devops-asService commented 3 years ago

Hi, I have enabled ebs autoscaling for kafka cluster. resource "aws_msk_cluster" "kafka" { ..................... broker_node_group_info { instance_type = <dummy> **ebs_volume_size = 1** ............... security_groups = [aws_security_group.kafka_private_sg.id] } ................ } I have the ebs autoscaling enabled for this cluster. The initial ebs volume was 1 GB. Later, the ebs volume size got expanded because of the autoscalig enabled. The next time I apply some changes( after the ebs resize happened); terraform tries to reduce the ebs volume back to the 1 which was set initially and fails with a error. It fails with the following error message Error: error updating MSK Cluster (arn:aws:kafka:us-east-2:798036251187:cluster/Edhperf-msk-cluster/f33753ee-8a35-48dd-abb1-b0262001800a-4) broker storage: BadRequestException: To update storage, you must increase it by at least 10 GiB. Can somebody help me on this.

HackerTheMonkey commented 3 years ago

I've encountered the same issue. Try to update your TF config in a way for the initial ebs volume size matches what it currently is in AWS. In my case TF was trying to reduce it back to the initial size which is no longer the case after a couple of auto-scaling events

Note that the above is just a workaround to keep things moving. IMO, this should not really be the case and the initial size is treated as per what it really is, i.e. initial

HackerTheMonkey commented 3 years ago

The following worked well for us:

  broker_node_group_info {
    ebs_volume_size = local.ebs_volume_init_size[var.environment]   
  }

ignore_changes = [
      broker_node_group_info.0.ebs_volume_size
    ]

A bit of an unexpected syntax, but solved the issue and TF apply won't attempt to reset it back after it has changed via either auto/manual scaling!

VladMasarik commented 3 years ago

@HackerTheMonkey hack worked for me, although I had to use a different syntax as well

  broker_node_group_info {
    ebs_volume_size = local.ebs_volume_init_size[var.environment]   
  }

  lifecycle {
    ignore_changes = [
      broker_node_group_info.0.ebs_volume_size
    ]
  }
llnformer commented 2 years ago

is there any timeline to add support for EBS autoscaling in the aws_msk_clusterresource?

pascal-hofmann commented 2 years ago

With the latest provider version the correct syntax is:

  broker_node_group_info {
    storage_info {
      ebs_storage_info {
        volume_size = …
      }
    }
  }

  lifecycle {
    ignore_changes = [
      broker_node_group_info[0].storage_info[0].ebs_storage_info[0].volume_size
    ]
  }
yermulnik commented 1 year ago

This is not directly related though is about ebs_volume_size and storage_info.ebs_storage_info.volume_size: I'm trying to figure out whether this is a bug in provider or something local to me — since I switched from ebs_volume_size to storage_info.ebs_storage_info.volume_size I've found that Terraform doesn't pick up changes to the value of storage_info.ebs_storage_info.volume_size, so that to increase cluster storage I now need to do this manually. Does someone else experience similar issue? Is this expected behavior or do I need to file an issue for AWS Provider? Thanks.

jhovell commented 1 year ago

@yermulnik do you have autoscaling enabled and the

lifecycle {
    ignore_changes = [
      broker_node_group_info[0].storage_info[0].ebs_storage_info[0].volume_size
    ]
  }

... included in your template? if so sounds like "expected behavior" given this issue is still open.

yermulnik commented 1 year ago

@jhovell Yeah, I missed to update this thread, sorry. This indeed was lifecycle to ignore changes to volume size 🤦🏻

patrickherrera commented 7 months ago

@pascal-hofmann's comment above fixed it for me and successfully excluded external changes from the plan. However as I did not specify anything for provisioned_throughput within the ebs_storage_info block, Terraform tried to pass nulls:

          ~ storage_info {
              ~ ebs_storage_info {
                    # (1 unchanged attribute hidden)
                  - provisioned_throughput {
                      - enabled           = false -> null
                      - volume_throughput = 0 -> null
                    }
                }
            }

Which AWS didn't like:

│ Error: updating MSK Cluster (arn:aws:kafka:ap-southeast-2:123456:cluster/tf-module-msk-kafka-cluster/fca403e4-d8b4-4a30-980d-c9d0a4dad3e8-2) broker storage: operation error Kafka: UpdateBrokerStorage, https response error StatusCode: 400, RequestID: 0b9fa85f-96e6-4405-a2fc-45471b619a9c, BadRequestException: The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again.

Explicitly setting enabled to false (the default) meant that it matched the real deployment and no changes were made:

      ebs_storage_info {
        volume_size = var.initial_volume_size_gib

        provisioned_throughput {
          enabled = false
        }
      }

Ignoring changes to the entire ebs_storage_info block can also work if you haven't changed anything else:

  lifecycle {
    ignore_changes = [
      broker_node_group_info[0].storage_info[0].ebs_storage_info
    ]
  }

EDIT: Forget it. I thought I tested this thoroughly but I'm still having issues. Getting the exact same error as this: https://github.com/hashicorp/terraform-provider-aws/issues/26031#issuecomment-1607851788, although my plan now thinks that it needs to make a change whereas what I tested above produced no plan change at all. Now I get:

      ~ broker_node_group_info {
            # (4 unchanged attributes hidden)
          ~ storage_info {
              ~ ebs_storage_info {
                    # (1 unchanged attribute hidden)
                  + provisioned_throughput {
                      + enabled = false
                    }
                }
            }
            # (1 unchanged block hidden)
        }

and the same error about no change being made. Not sure why that should be an issue anyway - the request should be idempotent and simply take no action if nothing needs to change