hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.84k stars 9.19k forks source link

[Bug]: ExpiredTokenException: The security token included in the request is expired #32809

Closed slackfan closed 2 months ago

slackfan commented 1 year ago

Terraform Core Version

1.5.4

AWS Provider Version

5.10.0

Affected Resource(s)

IAM Session handling of the AWS Provider.

Expected Behavior

The issue does not occur.

Actual Behavior

Sporadically and since some weeks, not easily reproducible but constantly we see the issue ExpiredTokenException: The security token included in the request is while the AWS provider logic waits for resources being created. This may be EKS clusters or EKS Node group resources.

Relevant Error/Panic Output Snippet

13:16:46  │ Error: waiting for EKS Cluster (githubissue) create: ExpiredTokenException: The security token included in the request is expired
13:16:46  │     status code: 403, request id: a16bb54f-d47d-4283-b431-70f06fb0112f
13:16:46  │ 
13:16:46  │   with module.eks_cluster[0].aws_eks_cluster.eks_cluster,
13:16:46  │   on modules/eks_cluster/main.tf line 5, in resource "aws_eks_cluster" "eks_cluster":
13:16:46  │    5: resource "aws_eks_cluster" "eks_cluster" {
13:16:46  │

Terraform Configuration Files

I am not sure what is needed here. The setup is terraform code in a docker container, running on an EC2 Instance. The terraform code uses the assume_role feature of the AWS provider and executes using that dedicated given IAM role.

Steps to Reproduce

(re-)create resources which require potentially more than 15 min for being done.

Debug Output

2023-08-02T11:15:03.658Z [TRACE] provider.terraform-provider-aws_v5.10.0_x5: [TRACE] Waiting 10s before next try
2023-08-02T11:15:13.664Z [DEBUG] provider.terraform-provider-aws_v5.10.0_x5: HTTP Request Sent: http.method=GET tf_mux_provider=*schema.GRPCProviderServer @module=aws aws.sdk=aws-sdk-go http.request.body= http.url=https://eks.us-east-1.amazonaws.com/clusters/githubissue http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.5.4 (+https://www.terraform.io) terraform-provider-aws/5.10.0 (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.305 (go1.20.6; linux; amd64)" net.peer.name=eks.us-east-1.amazonaws.com tf_provider_addr=registry.terraform.io/hashicorp/aws aws.service=EKS http.flavor=1.1 http.request.header.x_amz_date=20230802T111513Z http.request.header.x_amz_security_token=***** tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.33/logger.go:96 aws.operation=DescribeCluster aws.region=us-east-1 http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************WZWO********2/us-east-1/eks/aws4_request, Sign**********host;x-amz-date;x-amz-security-token, Signature=*****" tf_req_id=719982be-0e70-953a-85ae-cae695f79f20 tf_resource_type=aws_eks_cluster timestamp=2023-08-02T11:15:13.664Z
2023-08-02T11:15:13.674Z [DEBUG] provider.terraform-provider-aws_v5.10.0_x5: HTTP Response Received: aws.service=EKS http.response.header.access_control_allow_headers=*,Authorization,Date,X-Amz-Date,X-Amz-Security-Token,X-Amz-Target,content-type,x-amz-content-sha256,x-amz-user-agent,x-amzn-platform-id,x-amzn-trace-id http.response.header.access_control_allow_origin=* http.response.header.access_control_expose_headers=x-amzn-errortype,x-amzn-errormessage,x-amzn-trace-id,x-amzn-requestid,x-amz-apigw-id,date http.response.header.x_amz_apigw_id=JB4szFIEoAMFxDg= aws.operation=DescribeCluster tf_req_id=719982be-0e70-953a-85ae-cae695f79f20 @module=aws http.response.body="{"message":"The security token included in the request is expired"}
2023-08-02T11:15:13.674Z [WARN]  provider.terraform-provider-aws_v5.10.0_x5: Disabling retries after next request due to expired credentials: tf_mux_provider=*schema.GRPCProviderServer tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=1539ddee-c5e8-7fb5-ee3a-44dc47ea1ebc @module=aws error="ExpiredTokenException: The security token included in the request is expired
2023-08-02T11:15:13.674Z [ERROR] provider.terraform-provider-aws_v5.10.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="waiting for EKS Cluster (githubissue) create: ExpiredTokenException: The security token included in the request is expired
2023-08-02T11:30:34.937Z [TRACE] provider.terraform-provider-aws_v5.10.0_x5: [TRACE] Waiting 10s before next try
2023-08-02T11:30:44.947Z [DEBUG] provider.terraform-provider-aws_v5.10.0_x5: HTTP Request Sent: @module=aws aws.sdk=aws-sdk-go tf_rpc=ApplyResourceChange http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************YN72********2/us-east-1/eks/aws4_request, Sign**********host;x-amz-date;x-amz-security-token, Signature=*****" http.request.header.x_amz_security_token=***** http.url=https://eks.us-east-1.amazonaws.com/clusters/githubissue/node-groups/private_config net.peer.name=eks.us-east-1.amazonaws.com tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=598a3ed6-baff-0317-edc7-0abad0a471e0 tf_resource_type=aws_eks_node_group http.request.body= aws.operation=DescribeNodegroup aws.region=us-east-1 aws.service=EKS http.flavor=1.1 http.method=GET http.request.header.x_amz_date=20230802T113044Z tf_mux_provider=*schema.GRPCProviderServer @caller=github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.33/logger.go:96 http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.5.4 (+https://www.terraform.io) terraform-provider-aws/5.10.0 (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.305 (go1.20.6; linux; amd64)" timestamp=2023-08-02T11:30:44.947Z
2023-08-02T11:30:44.957Z [DEBUG] provider.terraform-provider-aws_v5.10.0_x5: HTTP Response Received: http.duration=9 http.response.header.x_amzn_trace_id=Root=1-64ca3ec1-1354d9ae194f2dbc3ba244f6 tf_mux_provider=*schema.GRPCProviderServer aws.region=us-east-1 http.response.header.x_amz_apigw_id=JB6-UEj9IAMFa5w= http.response.header.x_amzn_errortype=ExpiredTokenException http.response.header.x_amzn_requestid=3e60e7ed-0091-44e9-be65-9ec58dec3116 http.response.header.access_control_allow_headers=*,Authorization,Date,X-Amz-Date,X-Amz-Security-Token,X-Amz-Target,content-type,x-amz-content-sha256,x-amz-user-agent,x-amzn-platform-id,x-amzn-trace-id http.response_content_length=67 tf_resource_type=aws_eks_node_group @caller=github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.33/logger.go:144 http.response.header.date="Wed, 02 Aug 2023 11:32:17 GMT" tf_rpc=ApplyResourceChange aws.sdk=aws-sdk-go http.response.header.content_type=application/json http.status_code=403 aws.operation=DescribeNodegroup http.response.body="{"message":"The security token included in the request is expired"}
2023-08-02T11:30:44.957Z [WARN]  provider.terraform-provider-aws_v5.10.0_x5: Disabling retries after next request due to expired credentials: error="ExpiredTokenException: The security token included in the request is expired
2023-08-02T11:30:44.957Z [ERROR] provider.terraform-provider-aws_v5.10.0_x5: Response contains error diagnostic: diagnostic_detail= tf_req_id=598a3ed6-baff-0317-edc7-0abad0a471e0 tf_resource_type=aws_eks_node_group tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.18.0/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_severity=ERROR diagnostic_summary="waiting for EKS Node Group (githubissue:private_config) to create: ExpiredTokenException: The security token included in the request is expired

I have a full log with TRACE log level available, but I cannot easily share it on Github, so...

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

slackfan commented 1 year ago

Reading the documentation https://registry.terraform.io/providers/hashicorp/aws/latest/docs#duration I am more confused than before. The documentation indeed mentions that there is a timeout, but it does not mention if there is an auto-renewal of the session (as needed) or not. I am also relatively certain that the exception is new behavior.

mattburgess commented 9 months ago

The 15 minute duration between the start and end of your logs is suspicious. The AssumeRole API takes a configurable duration. Although it defaults to 3600 seconds (1 hour), the minimum it can be set to is 900 seconds (15 minutes). How is the docker container assuming the role and ensuring that it refreshes credentials if required?

justinretzolk commented 7 months ago

Hey @slackfan 👋 I wanted to check in here to see if you're still having troubles. If so, are you able to provide any details around what Matthew asked above?

jonos-cms commented 6 months ago

Hello. I found this issue occurs if you have multiple instances of logins defined in your aws credentials file. An example; I have AWS credentials that get minted under my "default" profile. However, I want to apply a module to a different account using a different credentials file, called "dev". I used the default configuration to provision a backend S3 bucket for "dev"'s state using my "Default" credentials, but then define the objects using the "dev" credentials for provisioning the actual resources.

When either one of the session tokens are invalidated, the following error is thrown. Even though my "default" credentials weren't needed to change dev's state, the fact that both the credentials for "dev" and "default" were in ~/.aws/credentials means both needed to evaluate as valid for one set of credentials to be used.

Hope this helps.

justinretzolk commented 2 months ago

Given the information above, this appears to be behaving as I would expect. Since we've not heard back otherwise, I'm going to close this issue. If you have any further issues with the provider, please do let us know!

github-actions[bot] commented 2 months ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.