terraform-aws-modules / terraform-aws-opensearch

Terraform module to create AWS OpenSearch resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/opensearch/aws
Apache License 2.0
15 stars 22 forks source link

EBS volume size change times out when throughput is not valid #25

Closed lplazas closed 1 week ago

lplazas commented 1 week ago

Description

Operation: While making a simple EBS volume size change in our domain see plan:

 # module.opensearch.aws_opensearch_domain.opensearch will be updated in-place
  ~ resource "aws_opensearch_domain" "opensearch" {
        id                 = "arn:aws:es:us-east-1:*REMOVED*:domain/*REMOVED*"
        tags               = *REMOVED*
        # (10 unchanged attributes hidden)

      ~ ebs_options {
          ~ volume_size = 100 -> 256
            # (4 unchanged attributes hidden)
        }

        # (15 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Issue: We ran into a timeout that in our case bubbled up as a ExpiredTokenException, given that our assume role timed out after waiting for 15m. We increased the timeout to 1h and it still failed for the same reason after 1h.

module.opensearch.aws_opensearch_domain.opensearch: Still modifying... [id=arn:aws:es:us-east-1:*REMOVED*:domain/*REMOVED*, 16m0s elapsed]
module.opensearch.aws_opensearch_domain.opensearch: Still modifying... [id=arn:aws:es:us-east-1:*REMOVED*:domain/*REMOVED*, 16m10s elapsed]

Error: updating OpenSearch Domain (arn:aws:es:us-east-1:*REMOVED*:domain/*REMOVED*): operation error OpenSearch: UpdateDomainConfig, https response error StatusCode: 403, RequestID: 1b912dcd-c49e-411e-b2b7-a6ecfb2640be, api error ExpiredTokenException: The security token included in the request is expired

  with module.opensearch.aws_opensearch_domain.opensearch,
  on .terraform/modules/opensearch/main.tf line 66, in resource "aws_opensearch_domain" "opensearch":
  66: resource "aws_opensearch_domain" "opensearch" {

Solution: While debugging the tf output, we saw the actual failure which isn't handled by the provider:

2024-09-04T09:50:39.548Z [DEBUG] provider.terraform-provider-aws_v5.65.0_x5: HTTP Response Received: tf_req_id=e178677f-8b41-7728-0a87-4ce343aa70cd http.duration=213 http.response.body="{"message":"Throughput must be between 250 and 593"}
" http.response.header.content_type=application/json http.response_content_length=52 http.status_code=409 rpc.method=UpdateDomainConfig tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/aws-sdk-go-base/v2@v2.0.0-beta.55/logging/tf_logger.go:45 @module=aws rpc.service=OpenSearch tf_resource_type=aws_opensearch_domain aws.region=us-east-1 http.response.header.x_amzn_errortype=LimitExceededException tf_aws.sdk=aws-sdk-go-v2 tf_mux_provider=*schema.GRPCProviderServer http.response.header.date="Wed, 04 Sep 2024 09:50:38 GMT" http.response.header.x_amzn_requestid=4440b109-771d-4ade-a4cb-91e9795702c5 rpc.system=aws-api tf_aws.signing_region= timestamp=2024-09-04T09:50:39.548Z

Versions

Expected behavior

It should fail fast either as a tf validation or it should catch the API error, instead of timing out.

lplazas commented 1 week ago

My bad, I just realized this is the module and not the provider. I opened an issue on the provider https://github.com/hashicorp/terraform-provider-aws/issues/39136, feel free to close this one.