hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.83k stars 9.18k forks source link

[Bug]: AWS ECS API throttling causes abort #28852

Open xsnrg opened 1 year ago

xsnrg commented 1 year ago

Terraform Core Version

1.3.6

AWS Provider Version

4.47.0

Affected Resource(s)

In our case, AWS ECS CreateService occasionally gets throttled. It has not been a problem prior to this year, but now it appears that the errorCode has changed in AWS from ThrottlingException to ClientException, and the errorMessage (Cloudwatch terms) has changed from An unknown error occurred to Received throttling error when describing target group arn....

The ClientException is not re-tried, with only 1 error in the Cloudwatch logs when terraform aborts.

The desired outcome would be that the new error is still identified as throttling, and re-tried.

Image attached of the different throttling errorCode as seen from Cloudwatch, if I can get the issue form to accept one.

I suspect it may have something to do with the errors that changed, coming from AWS.

Also tried with terraform 1.3.7 and AWS provider 4.49.0 without any difference.

Expected Behavior

terraform identifies throttling and retries

Actual Behavior

throttling error causes abort

Relevant Error/Panic Output Snippet

The error from terraform is: `Error: error creating ECS service (servicename redacted): ClientException: Received throttling error when describing target group arn:...`

Terraform Configuration Files

Will need to redact code if it is needed. The resource block is:

resource "aws_ecs_service" "service" {}

Steps to Reproduce

terraform apply with about 25 ECS services to cause throttling. Note that this does not always happen, but when it does, the run is aborted.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

xsnrg commented 1 year ago

image

xsnrg commented 1 year ago

This bug is still biting us pretty hard, so I decided to do some more digging.

What I found appears to be that AWS updated the errorCode, as the above image shows from Cloudwatch, but they did not update the aws-sdk-go nor the aws-sdk-go-v2 to match. In looking through the Go SDK, the file aws/request/retryer_test.go has not been updated since 2020, and in the v2 SDK, the file that defines the DefaultThrottleErrorCodes is aws/retry/standard.go. It has not been updated since 2020 either.

I will leave this ticket open, and create one against the aws-sdk-go-v2 project, referencing this one.