gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.41k stars 1.74k forks source link

S3 uploader: switch to aws-sdk-go-v2 #39248

Closed zmb3 closed 1 month ago

zmb3 commented 6 months ago

We have observed 503 errors in the auth uploader's CompleteMultipartUpload calls.

As per https://repost.aws/knowledge-center/http-5xx-errors-s3:

The error code 500 Internal Error indicates that Amazon S3 can't handle the request at that time. The error code 503 Slow Down typically indicates that the number of requests to your S3 bucket is high.

This same post says that:

All AWS SDKs have a built-in retry mechanism with an algorithm that uses exponential backoff.

However, it looks like there's a long-standing issue in aws-sdk-go where the SDK does not throttle S3 requests properly: https://github.com/aws/aws-sdk-go/issues/3977

Edit: aws-sdk-go-v2 correctly implements throttled retries in this case. We should use it here.

rosstimothy commented 6 months ago

Does aws-sdk-go-v2 suffer from the same problem?

zmb3 commented 6 months ago

Good thought, @rosstimothy.

No, it looks like aws-sdk-go-v2 does implement retries and throttling for this case: https://github.com/aws/aws-sdk-go-v2/blob/49b368e9d7a38a2373c833be135270e5390c2b41/aws/retry/standard.go#L72

V1 for reference: https://github.com/aws/aws-sdk-go/blob/53e4759915361d72a33be783d5e878a63d85f807/aws/request/retryer.go#L84-L96

zmb3 commented 1 month ago

Closed by #44728