hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.88k stars 9.22k forks source link

[Bug]: data block to fetch `aws_iam_roles` hangs forever #39110

Closed abhineetsbhamra closed 2 months ago

abhineetsbhamra commented 3 months ago

Terraform Core Version

1.8

AWS Provider Version

5.65.0

Affected Resource(s)

aws_iam_roles

Expected Behavior

data source fetch for resource aws_iam_roles works in seconds if using 5.64.0 as soon as I use the latest 5.65.0 data blocks to fetch aws_iam_roles just hangs for an eternity and I have to kill the workflow.

Actual Behavior

data source fetch for resource aws_iam_roles works

Relevant Error/Panic Output Snippet

module.root.module.kms.data.aws_iam_roles.administratos: Still reading... [4m0s elapsed]
module.root.module.logs.module.kms[0].data.aws_iam_roles.devopsroles: Still reading... [4m0s elapsed]

Terraform Configuration Files

terraform {
  backend "s3" {}

  required_version = "1.8"
  required_providers {
    archive = {
      source  = "hashicorp/archive"
      version = "2.2.0"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "5.65.0"
    }
  }
}

Steps to Reproduce

use latest provider and try to get aws_iam_roles via a data block

data "aws_iam_roles" "example_roles" {
  name_regex = "example.*"
}

Debug Output

2024-09-02T00:21:13.340Z [DEBUG] Resource instance state not found for node "data.aws_iam_roles.roles", instance data.aws_iam_roles.roles
2024-09-02T00:21:13.340Z [DEBUG] ReferenceTransformer: "data.aws_iam_roles.roles" references: []
2024-09-02T00:21:13.341Z [DEBUG] Resource instance state not found for node "data.aws_iam_roles.administrators", instance data.aws_iam_roles.administrators
2024-09-02T00:21:13.341Z [DEBUG] ReferenceTransformer: "data.aws_iam_roles.administrators" references: []

debug request body

http.request.body=
  | Action=ListRoles&Version=2010-05-08
   http.resend_count=6 http.method=POST http.request.header.x_amz_security_token="*****" http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.7.3 (+https://www.terraform.io/) terraform-provider-aws/5.65.0 (+https://registry.terraform.io/providers/hashicorp/aws) m/C aws-sdk-go-v2/1.30.4 os/linux lang/go#1.23.0 md/GOOS#linux md/GOARCH#amd64 api/iam#1.35.0" rpc.method=ListRoles http.request.header.amz_sdk_invocation_id=af5dd4eb-99e2-4c4b-957d-461401d2cc66 http.url=https://iam.amazonaws.com/ rpc.service=IAM rpc.system=aws-api tf_aws.signing_region="" tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=ReadDataSource timestamp=2024-09-02T00:21:17.833Z

Panic Output

No panic output

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 3 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

sbkg0002 commented 3 months ago

We encountered the same thing. Using <v5.56.0 for now.

pauldtill commented 3 months ago

We are seeing similar behaviour @ 5.65.0 with certificate manager (ACM), although not a data source, it is refreshing the state of an existing resource, so assume it uses a similar query path.

Terraform plan sits in a long hang before eventually failing with -

module.aws_acm_certificate.cert: Refreshing state... [id=********]
Planning failed. Terraform encountered an error while generating this plan.

│ Error: reading ACM Certificate (*****): operation error ACM: DescribeCertificate, exceeded maximum number of attempts, 25, https response error StatusCode: 0, RequestID: , request send failed, Post "https://acm.eu-central-1.amazonaws.com/": read tcp *****->54.239.55.147:443: read: connection reset by peer

Reverted the configuration back to 5.64.0 and the plan works correctly with no other changes.

ewbankkit commented 3 months ago

@abhineetsbhamra @sbkg0002 @pauldtill Thanks for raising this issue 👏. To help us investigate further, which AWS authentication method(s) are you using (role assumption, SSO, static key)?

pauldtill commented 3 months ago

For the "possibly related" issue I mention (with aws_acm_certificate) - we are using an IAM role - assume_role_with_web_identity

sbkg0002 commented 3 months ago

I also use assume role.

The problem is that the new provider uses http2 traffic, which needs different rules in the AWS Firewall. (thanks to @omerakcasbp for all the debugging work! 💪 )

ewbankkit commented 3 months ago

Relates (pretty sure):

ewbankkit commented 3 months ago

See https://github.com/hashicorp/aws-sdk-go-base/issues/1163#issuecomment-2331352035.

abhineetsbhamra commented 3 months ago

e you using (ro

we are using role assumption.

pauldtill commented 2 months ago

I also use assume role.

The problem is that the new provider uses http2 traffic, which needs different rules in the AWS Firewall. (thanks to @omerakcasbp for all the debugging work! 💪 )

@sbkg0002 could you expand on what needed to be changed here ? We are using AWS Network Firewall - but I'm not seeing anything obvious blocked from our logging

omerakcasbp commented 2 months ago

For testing we enabled TCP 443 traffic to egress for agents. With that traffic coming from agents started to flow for iam. Also If you are using TLS SNI filtering on your firewall please check for it. In our case requests to iam.amazonaws.com does not have a servername value. So it stucked at filter. Check logs for target ip address.

pauldtill commented 2 months ago

@omerakcasbp - we found an AWS network firewall log entry as below (removed a few internal data items like IP's), no TLS SNI as you mentioned. Since we are using domain allow lists, there doesn't seem to be much we can use here to allow this traffic, how did you get around this ?

{
    "event_timestamp": "1725902638",
    "event": {
        "app_proto": "tls",
        "event_type": "alert",
        "alert": {
            "severity": 3,
            "rev": 0,
            "signature": "",
            "action": "blocked",
            "category": ""
        },
        "proto": "TCP",
        "tls": {
            "version": "UNDETERMINED",
            "ja3": {},
            "ja3s": {}
        },
        "dest_port": 443,
        "timestamp": "2024-09-09T17:23:58.165763+0000"
    }
}

The AWS support response (network firewall) was as below -

The UNDETERMINED value is given in the log when the TLS version is unknown (not supported by Suricata). The supported TLS versions are TLS versions 1.1, 1.2, and 1.3. [2]. Please check the TLS version being used and check if it can be changed to one of the supported versions. Feel free to get back to us if you need any further support.

I assume they are off track here, since there isn't anything changed on TLS versions here is there?

ewbankkit commented 2 months ago

We have opened https://github.com/hashicorp/terraform-provider-aws/issues/39311 to capture the longer-term work.

ewbankkit commented 2 months ago

@abhineetsbhamra @sbkg0002 @pauldtill Assuming that the Go 1.22.6 downgrade with Terraform AWS Provider v5.67.0 fixed this problem, I'm going to close this issue. Discussion will continue in https://github.com/hashicorp/terraform-provider-aws/issues/39311.

github-actions[bot] commented 2 months ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.