outdoors-007 commented 1 year ago

Terraform Core Version

1.2.3,1.5.6

AWS Provider Version

5.14.0

Affected Resource(s)

All resource types are affected.

Expected Behavior

We at FICO are seeing a substantial slowdown in performance between provider version 5.13.1 and 5.14.0. It's happening for all terraform commands that interact with our AWS environment against any types of resources. Example of difference in performance:

'terraform plan' with AWS provider 5.13.1(or earlier): 9 seconds 'terraform plan' with AWS provider 5.14.0: 42 seconds

We use S3 as our backend for Terraform state files and DynamoDB as our backend for locking. We see this behavior with both Terraform version 1.2.3 and the latest version 1.5.6, so it seems specific to provider version 5.14.0.

Actual Behavior

See comments in Expected Behavior.

Relevant Error/Panic Output Snippet

There is no error, it's just a notable difference in performance.

Terraform Configuration Files

There is no error, it's just a notable difference in performance.

Steps to Reproduce

Set the AWS provider version in your source to 5.13.1 and then run a 'terraform plan'. Then change the provider version to 5.14.0 and run the same 'terraform plan'.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

ewbankkit commented 1 year ago

@outdoors-007 Thanks for raising this issue 👏. To help us investigate could you please paste in a suitable Terraform configuration that exhibits the problem?

outdoors-007 commented 1 year ago

Hi. Thank you for the quick response. Hopefully this is what you requested:

Our terraform provider.tf file:

terraform {
  required_version = "1.2.3"  # we also tried with 1.5.6 and still experienced the performance issue
  required_providers {
    aws = "5.1.4"
  }
  backend "s3" {
    dynamodb_table = "terraform-locking"
  }
}

We dynamically build a backend file for credentials and the specs for our state bucket and file:

 BACKEND_FILE=".terraform.backend"
 echo "region=\"${REGION}\"" > ${BACKEND_FILE}
 echo "bucket=\"$BUCKET\"" >> ${BACKEND_FILE}
 echo "key=\"AWS/$ACCT_TYPE/$ACCT_DIR/$ACCT_SUBDIR.state\"" >> ${BACKEND_FILE}
 echo "access_key=\"$BUCKET_ACCESS_KEY\"" >> ${BACKEND_FILE}
 echo "secret_key=\"$BUCKET_SECRET_KEY\"" >> ${BACKEND_FILE}
 echo "token=\"$BUCKET_SESSION_TOKEN\"" >> ${BACKEND_FILE}

And then run terraform init with '-backend-config="${BACKEND_FILE}"'

ablackrw commented 1 year ago

I confirm this regression. In our case, we have this stripped down terraform block in our main.tf file

terraform {
  required_version = "~> 1.4"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
    }
  }
  backend "s3" {
  }
}

After this, we define a provider similar to the following for every full AWS region (no local or lightwave zones), for a total of ~28 providers

provider "aws" {
  alias   = "afs1"
  region  = "af-south-1"
  assume_role {
    role_arn     = "arn:aws:iam::<account>:role/<role>"
    session_name = "terraform"
  }
}

In a second file, we use each of these providers to instantiate a custom module which provisions resources of the following types:

aws_guardduty_detector
aws_guardduty_organization_configuration
aws_guardduty_publishing_destination

With the 5.13.1 provider, a plan operation completes in a 'reasonable' amount of time (~1 minute, including some other modules). With the 5.14 provider, i/o timeouts are received instantiating some of these providers.

ewbankkit commented 1 year ago

@outdoors-007 Is your terraform plan running across multiple provider "aws" {} instances (e.g. for multiple AWS Regions), similar to @ablackrw? Thanks.

outdoors-007 commented 1 year ago

@outdoors-007 Is your terraform plan running across multiple provider "aws" {} instances (e.g. for multiple AWS Regions), similar to @ablackrw? Thanks.

No. For my simple debugging example, I only have 1 'aws' provider declaration pointing at 'us-west-2'. The code is only trying to create ~20 SSM parameters.

ewbankkit commented 1 year ago

Under suspicion: https://github.com/hashicorp/terraform-provider-aws/pull/33147.

Resident-Alien commented 1 year ago

My performance issue is substantially more severe. Using AWS provider 5.14.0 and the latest 5.15.0 we are seeing 100% CPU on the system and it is taking 10-20 minutes to run a plan. Using provider 5.13.1 it runs fine, no 100% CPU and the plan runs in about a minute.

This seems to be really bad when I use two providers for different regions. us-east-1 and us-west-1.

h-rasi commented 1 year ago

Hi. We are facing with the same issue using AWS provider 5.15.0. The terraform code works only in one region (eu-central-1) but manages several AWS accounts (providers).

SerhiiKorolik commented 1 year ago

+1 When running terraform plan with version 5.14.0 - my MAC M1 with 8 cores just uses 100% CPU After reverting to 5.13.1 - I see 15-20% spikes.

outdoors-007 commented 1 year ago

Hi. Are there any updates on a possible fix for this? We have a very large and spread out TF code base and all of our provider files are set to pull 'latest' of the AWS provider and it would be a fairly extensive task to set all of them to specifically use 5.13.1. Our hope is that a fix can be released soon and remaining patient is a better option. Thank you.

YakDriver commented 1 year ago

I have tested this extensively and the new version of regexache in #33317 seems to bring performance back to what it was before. However, please let us know after you test v5.16.0 what you're seeing.

github-actions[bot] commented 1 year ago

This functionality has been released in v5.16.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform-provider-aws

[Bug]: new version 5.14.0 causes terraform operations to run 4x slower than 5.13.1 #33218

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

Community Note