pulumi / pulumi-terraform

A resource package that allows Pulumi programs to use Terraform state
Apache License 2.0
112 stars 18 forks source link

Terraform remote state with S3 fails intermittently due to AWS rate limiting #677

Open piotr-bzdyl-vertexinc opened 2 years ago

piotr-bzdyl-vertexinc commented 2 years ago

What happened?

We use RemoteStateReference with S3 backend along with AssumeRoleWithWebIdentity for AWS API authentication (with auth credentials provided in the environment variables or in $HOME/.aws/credentials profiles).

As AWS applies rate limiting, our Pulumi previews and updates quite often fail with:

error: Preview failed: error in backend configuration: error configuring S3 Backend: no valid credential sources for S3 Backend found.
Please see https://www.terraform.io/docs/language/settings/backends/s3.html
for more information about providing credentials.
Error: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements
    status code: 400, request id: **************

The error message is misleading as upon retries in other places (e.g. Pulumi AWS provider) with the identical credentials data, the call to AssumeRoleWithWebIdentity succeeds.

We had the same issue with Pulumi AWS provider and were able to fix it by setting maxRetries parameter on the provider which effectively fixed these issues. It would be good to have a similar behaviour implemented for S3 backend of Terraform remote state resource.

Steps to reproduce

See the previous section.

Expected Behavior

RemoteStateReference resource retries requests to authenticate against AWS API when it gets an error.

Actual Behavior

RemoteStateReference fails immediately without retrying.

Versions used

pulumi 3.30.0

Python packages: pulumi-3.34.0 pulumi_terraform-5.6.0

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

jkisk commented 2 years ago

Thank you for opening this issue, I agree that implementing maxRetries or similar would improve the experience here.