hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.84k stars 9.19k forks source link

[Bug]: Multiple provider configurations fails when using AssumeRoleWithWebIdentity (probably because of rate errors) #27071

Open arielbeckjit opened 2 years ago

arielbeckjit commented 2 years ago

Terraform Core Version

3.75.2

AWS Provider Version

latest (hashicorp/setup-terraform@v1)

Affected Resource(s)

When assuming a role with web identity, 99% the terraform plan / apply fails with

Error: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements

We are running the code through github actions, so we've set up "token.actions.githubusercontent.com" as identity provider, while the github role (who is full admin) has following trust:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::xxxxx:oidc-provider/token.actions.githubusercontent.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "token.actions.githubusercontent.com:sub": "repo:*"
                }
            }
        }
    ]
}

To configure AWS i'm doing this:

          export AWS_ROLE_ARN=RELEVANT_ROLE
          export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/awscreds
          export AWS_DEFAULT_REGION=us-east-1
          echo AWS_WEB_IDENTITY_TOKEN_FILE=$AWS_WEB_IDENTITY_TOKEN_FILE >> $GITHUB_ENV
          echo AWS_ROLE_ARN=$AWS_ROLE_ARN >> $GITHUB_ENV
          echo AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION >> $GITHUB_ENV

          curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" "$ACTIONS_ID_TOKEN_REQUEST_URL" | jq -r '.value' > $AWS_WEB_IDENTITY_TOKEN_FILE

then my actual tf code is (it's for around 14 regions)

provider "aws" {

  assume_role {
    role_arn = RELEVANT_ROLE
  }
  region = "us-east-1"
}

provider "aws" {

  region = "us-east-1"
  alias  = "us-east-1"
}
provider "aws" {

  region = "ap-northeast-2"
  alias  = "ap-northeast-2"
}
I'm then passing it to the providers (module) in main.tf
module "base-multi-region-resources" {
  source                           = "../../modules/base-multi-region-resources"
  providers = {
    aws.ap-northeast-2 = aws.ap-northeast-2
    aws.ap-southeast-1 = aws.ap-southeast-1
    aws.ap-southeast-2 = aws.ap-southeast-2
    aws.eu-central-1 = aws.eu-central-1
    aws.eu-west-3 = aws.eu-west-3
    aws.ap-northeast-3 = aws.ap-northeast-3
    aws.ap-south-1 = aws.ap-south-1
    aws.eu-north-1 = aws.eu-north-1
    aws.eu-west-2 = aws.eu-west-2
    aws.sa-east-1 = aws.sa-east-1
    aws.us-east-1 = aws.us-east-1
    aws.us-west-1 = aws.us-west-1
    aws.ap-northeast-1 = aws.ap-northeast-1
    aws.ca-central-1 = aws.ca-central-1
    aws.eu-west-1 = aws.eu-west-1
    aws.us-east-2 = aws.us-east-2
    aws.us-west-2 = aws.us-west-2
  }
}

in the module, i have the actual resource i'm creating (which is SNS with some configurations)

module "multi-region-sns-ap-northeast-2" {

  source            = "../multi-region-sns-topic"
  providers         = {
    aws = aws.ap-northeast-2
  }
}
module "multi-region-sns-ap-southeast-1" {

  source            = "../multi-region-sns-topic"
  providers         = {
    aws = aws.ap-southeast-1
  }
}
module "multi-region-sns-ap-southeast-2" {

  source            = "../multi-region-sns-topic"
  providers         = {
    aws = aws.ap-southeast-2
  }
}

this is the module's versions.tf:


terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
      configuration_aliases = [aws.ap-northeast-2,aws.ap-southeast-1,aws.ap-southeast-2,aws.eu-central-1,aws.eu-west-3,aws.ap-northeast-3,aws.ap-south-1,aws.eu-north-1,aws.eu-west-2,aws.sa-east-1,aws.us-east-1,aws.us-west-1,aws.ap-northeast-1,aws.ca-central-1,aws.eu-west-1,aws.us-east-2,aws.us-west-2]
    }
  }
}

Expected Behavior

Terraform should create those resources in all resources, but i fail sometime at plan, sometime at apply, one time it passed.

Actual Behavior

It looks like no retries are done when failing the assume role, and thus - it looks like it gets a rate exception, and fails.

I cannot get pass plan (it succeeded only once). When using a regular admin credentials on my machine it works every time (As i guess it doesn't assume the role each time?).

i saw 2 times it failed at init.

Relevant Error/Panic Output Snippet

Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.

Please see https://registry.terraform.io/providers/hashicorp/aws
for more information about providing credentials.

Error: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements
    status code: 400, request id: a85273fb-007a-49e0-91ae-0fe658afa680

  with provider["registry.terraform.io/hashicorp/aws"].ap-south-1,
  on main.tf line 105, in provider "aws":
 105: provider "aws" {

Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.

Please see https://registry.terraform.io/providers/hashicorp/aws
for more information about providing credentials.

Error: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements
    status code: 400, request id: e6e0010a-dd53-43da-bf39-9f5ce9abf6b0

  with provider["registry.terraform.io/hashicorp/aws"].eu-north-1,
  on main.tf line 113, in provider "aws":
 113: provider "aws" {

Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.

Please see https://registry.terraform.io/providers/hashicorp/aws
for more information about providing credentials.

Error: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements
    status code: 400, request id: [318](https://github.com/.....#step:8:319)2a0fe-3410-40ec-ad12-fcbff30eb365

  with provider["registry.terraform.io/hashicorp/aws"].eu-west-2,
  on main.tf line 122, in provider "aws":
 122: provider "aws" {

Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.

Terraform Configuration Files

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
      configuration_aliases = [aws.ap-northeast-2,aws.ap-southeast-1,aws.ap-southeast-2,aws.eu-central-1,aws.eu-west-3,aws.ap-northeast-3,aws.ap-south-1,aws.eu-north-1,aws.eu-west-2,aws.sa-east-1,aws.us-east-1,aws.us-west-1,aws.ap-northeast-1,aws.ca-central-1,aws.eu-west-1,aws.us-east-2,aws.us-west-2]
    }
  }
}

provider "aws" {

  assume_role {
    role_arn = ROLE_TO_ASSUME
  }
  region = "us-east-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-northeast-2"
  alias  = "ap-northeast-2"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-southeast-1"
  alias  = "ap-southeast-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-southeast-2"
  alias  = "ap-southeast-2"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "eu-central-1"
  alias  = "eu-central-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "eu-west-3"
  alias  = "eu-west-3"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-northeast-3"
  alias  = "ap-northeast-3"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-south-1"
  alias  = "ap-south-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "eu-north-1"
  alias  = "eu-north-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "eu-west-2"
  alias  = "eu-west-2"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "sa-east-1"
  alias  = "sa-east-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "us-west-1"
  alias  = "us-west-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ap-northeast-1"
  alias  = "ap-northeast-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "ca-central-1"
  alias  = "ca-central-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "eu-west-1"
  alias  = "eu-west-1"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "us-east-2"
  alias  = "us-east-2"
}

provider "aws" {
  # No credentials explicitly set here because they come from either the
  # environment or the global credentials file.

  region = "us-west-2"
  alias  = "us-west-2"
}

module "multi-region-sns--ap-northeast-2" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-northeast-2
  }
}
module "multi-region-sns--ap-southeast-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-southeast-1
  }
}
module "multi-region-sns--ap-southeast-2" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-southeast-2
  }
}
module "multi-region-sns--eu-central-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.eu-central-1
  }
}
module "multi-region-sns--eu-west-3" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.eu-west-3
  }
}
module "multi-region-sns--ap-northeast-3" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-northeast-3
  }
}
module "multi-region-sns--ap-south-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-south-1
  }
}
module "multi-region-sns--eu-north-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.eu-north-1
  }
}
module "multi-region-sns--eu-west-2" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.eu-west-2
  }
}
module "multi-region-sns--sa-east-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.sa-east-1
  }
}
module "multi-region-sns--us-east-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.us-east-1
  }
}
module "multi-region-sns--us-west-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.us-west-1
  }
}
module "multi-region-sns--ap-northeast-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ap-northeast-1
  }
}
module "multi-region-sns--ca-central-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.ca-central-1
  }
}
module "multi-region-sns--eu-west-1" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.eu-west-1
  }
}
module "multi-region-sns--us-east-2" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.us-east-2
  }
}
module "multi-region-sns--us-west-2" {

  source            = "../multi-region-sns-topic"
  stage             = var.stage
  topic_name        = var.topic_name
  allow_all_publish = true
  account_id        = var.account_id
  providers         = {
    aws = aws.us-west-2
  }
}

Then the example of the module:

t``` erraform { required_version = ">= 0.15 " required_providers { aws = { source = "hashicorp/aws" version = "~> 3.0" } } }

resource "aws_sns_topic" "sns-topic-publish-all" { name = var.topic_name }


### Steps to Reproduce

1. Create AWS role with admin
2. Create github action in some github repository
3. part of the actions, define the following:
      export AWS_ROLE_ARN=YOUR_ROLE
      export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/awscreds
      export AWS_DEFAULT_REGION=us-east-1
      echo AWS_WEB_IDENTITY_TOKEN_FILE=$AWS_WEB_IDENTITY_TOKEN_FILE >> $GITHUB_ENV
      echo AWS_ROLE_ARN=$AWS_ROLE_ARN >> $GITHUB_ENV
      echo AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION >> $GITHUB_ENV

      curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" "$ACTIONS_ID_TOKEN_REQUEST_URL" | jq -r '.value' > $AWS_WEB_IDENTITY_TOKEN_FILE

4. setup and install terraform in there

### Debug Output

_No response_

### Panic Output

_No response_

### Important Factoids

_No response_

### References

_No response_

### Would you like to implement a fix?

_No response_
github-actions[bot] commented 2 years ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

arielbeckjit commented 2 years ago

It looks like too many requests, is there any way to sleep / retry / perform only one assume role for all regions?

jBouyoud commented 1 year ago

Hi, I'm facing the same issue and I really wants to see it fixed. So I'll be glad to help fixing it. But I'm not really known how to start ;-)

@justinretzolk , any clue to start investigation ?

Looking AWS CLI doc , it seems a credentials cache mechanism is in place. Does this mechanism is already in place in aws provider ?

I'm not using exactly the same config, I rely on profile :

[profile account-name]
role_arn = arn:aws:iam::1234567890123:role/role_name}
web_identity_token_file = /path/to/web_identity_token_file
role_session_name = workflows
region = eu-central-1

The web identity token file (/path/to/web_identity_token_file ) is generated with

    - uses: actions/github-script@v6
      with:
        script: |
          const fs = require('fs/promises');
          const path = require('path');

          const awsConfigDir = path.resolve(process.env.HOME,'.aws');

          var webIdentityToken = await core.getIDToken('sts.amazonaws.com');

          await fs.mkdir(awsConfigDir, { recursive: true });
          await fs.writeFile(path.resolve(awsConfigDir,'identity-token'), webIdentityToken);

I'm able to run raw AWS CLI commands without any errors :

aws sts get-caller-identity --profile=account-name

And finally got the same error with terraform

│ Error: WebIdentityErr: failed to retrieve credentials
│ caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements
│   status code: 400, request id: ........
│ 

Some others refs :

gdavison commented 1 year ago

Since the authentication now retries, are you still seeing this error if you configure AWS_MAX_ATTEMPTS?

marshallford commented 1 year ago

@gdavison Is the change in a published version of the provider?

gdavison commented 1 year ago

@gdavison Is the change in a published version of the provider?

Yes