hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.7k stars 9.07k forks source link

[Bug]: aws_cloudformation_stack_set_instance resource failing while operation still in progress #28675

Open reifnir opened 1 year ago

reifnir commented 1 year ago

Terraform Core Version

1.3.7

AWS Provider Version

4.48.0

Affected Resource(s)

Expected Behavior

While Terraform is waiting for a StackSet to be deployed, it should continue waiting without failing if it receives an OperationInProgressException error that shares the same operation id as the one it's trying to perform.

In my example: terraform-20230104195015363300000003

Actual Behavior

After almost an hour of the stacks being newly deployed, Terraform bombs with a 409 error stating OperationInProgressException: Another Operation on StackSet {the arn} is in progress.

This occurred the first time after 50m41s and a second time at 57m20s. So, I'm able to repeat this.

Relevant Error/Panic Output Snippet

module.vega_cloud.aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy: Still creating... [50m31s elapsed]
module.vega_cloud.aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy: Still creating... [50m41s elapsed]
╷
│ Error: error creating CloudFormation StackSet (vega-cloud-iam-role-and-policy-member-accounts) Instance: OperationInProgressException: Another Operation on StackSet arn:aws:cloudformation:us-east-1:[REDACTED]:stackset/vega-cloud-iam-role-and-policy-member-accounts:e09fb131-ea70-4b33-8789-a2e653d03a99 is in progress: terraform-20230104195015363300000003
│   status code: 409, request id: cfa751b8-3280-4427-814e-1fadbfbf8aa4
│ 
│   with module.vega_cloud.aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy,
│   on modules/vega-cloud/main.tf line 53, in resource "aws_cloudformation_stack_set_instance" "linked_accounts_role_and_policy":
│   53: resource "aws_cloudformation_stack_set_instance" "linked_accounts_role_and_policy" {
│

Terraform Configuration Files


locals {
  org_root_ou_id       = "redacted"
  some_account_id      = "redacted"
  another_account_id   = "redacted"
  tags                 = { Some = "tags" }
  ct_template_contents = <<EOT
    AWSTemplateFormatVersion: 2010-09-09
    Description: >
      Example Cloudformation Stack Set for use in creating a Gitlab Issue
    Resources:
      RedactedDiscoveryReader:
        Properties:
          RoleName: RedactedDiscoveryReader
          AssumeRolePolicyDocument:
            Statement:
              - Action: "sts:AssumeRole"
                Condition:
                  StringEquals:
                    "sts:ExternalId": "redacted"
                Effect: Allow
                Principal:
                  AWS:
                    [
                      "arn:aws:iam::${local.some_account_id}:root",
                      "arn:aws:iam::${local.another_account_id}:root"
                    ]
              - Action: "sts:AssumeRole"
                Effect: Allow
                Principal:
                  Service: ["cloudformation.amazonaws.com"]
            Version: 2012-10-17
        Type: "AWS::IAM::Role"
      DiscoveryPolicy:
        DependsOn: RedactedDiscoveryReader
        Type: "AWS::IAM::Policy"
        Properties:
          PolicyDocument:
            Statement:
              # Elastic Kubernetes Service (EKS)
              - Effect: Allow
                Resource: "*"
                Action:
                  - "eks:Describe*"
                  - "eks:List*"
              # Many other statements removed for the sake of simplicity

          PolicyName: redacted_LinkedAccountDiscoveryPolicy
          Roles:
            - RedactedDiscoveryReader
  EOT
}

resource "aws_cloudformation_stack_set" "linked_accounts_role_and_policy" {
  name = "redacted-iam-role-and-policy-member-accounts"

  description = "redacted"

  auto_deployment {
    enabled                          = true
    retain_stacks_on_account_removal = false
  }

  call_as          = "SELF" # Being run from the MPA/Org root account
  capabilities     = ["CAPABILITY_NAMED_IAM"]
  permission_model = "SERVICE_MANAGED"

  # Cloud Formation YAML that creates an IAM role and policy that is assigned to that role
  # template_body    = file("${path.module}/files/redacted-cft-linkedaccount-discoveryonly.yaml")
  template_body = local.ct_template_contents

  # Not respected on create, hopefully works on update/delete
  operation_preferences {
    max_concurrent_count = 20
  }

  tags = local.tags

  timeouts {
    update = "2h"
  }

  lifecycle {
    # Applying the stack changes the value of this, so on subsequent applies, it looks like config drift
    ignore_changes = [administration_role_arn]
  }
}

resource "aws_cloudformation_stack_set_instance" "linked_accounts_role_and_policy" {
  stack_set_name = aws_cloudformation_stack_set.linked_accounts_role_and_policy.name
  deployment_targets {
    organizational_unit_ids = [local.org_root_ou_id]
  }

  parameter_overrides = {}
  region              = "us-east-1" # Seem to need to pick one
  retain_stack        = false
  call_as             = aws_cloudformation_stack_set.linked_accounts_role_and_policy.call_as

  # Not respected on create, hopefully works on update/delete
  operation_preferences {
    max_concurrent_count = 20
  }

  # Deploying took over an hour
  timeouts {
    create = "2h"
    update = "2h"
    delete = "2h"
  }
}

Steps to Reproduce

  1. Select an AWS Organization OU that has at least 80 (90-100 would be better) active linked/member accounts.
  2. Setup AWS provider to run from the org root/master payer account for that organization.
  3. Update the Terraform provided with a valid organization unit ID and any two valid AWS Account numbers
  4. Apply the Terraform

Debug Output

over 22K lines before this...
2023-01-04T21:38:45.385-0500 [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/hashicorp/aws\"] (close)"
2023-01-04T21:38:45.385-0500 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy"
2023-01-04T21:38:50.388-0500 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy"
2023-01-04T21:38:50.388-0500 [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/hashicorp/aws\"] (close)"
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Request cloudformation/DescribeStackSetOperation Details:
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: ---[ REQUEST POST-SIGN ]-----------------------------
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: POST / HTTP/1.1
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Host: cloudformation.us-east-1.amazonaws.com
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: User-Agent: APN/1.0 HashiCorp/1.0 Terraform/1.3.7 (+https://www.terraform.io) terraform-provider-aws/4.48.0 (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.162 (go1.19.3; linux; amd64)
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Length: 170
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Authorization: [REDACTED]
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Type: application/x-www-form-urlencoded; charset=utf-8
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amz-Date: 20230105T023850Z
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amz-Security-Token: [REDACTED]
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Accept-Encoding: gzip
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Action=DescribeStackSetOperation&CallAs=SELF&OperationId=terraform-20230105014241260100000002&StackSetName=redacted-iam-role-and-policy-member-accounts&Version=2010-05-15
2023-01-04T21:38:50.905-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: -----------------------------------------------------
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Response cloudformation/DescribeStackSetOperation Details:
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: ---[ RESPONSE ]--------------------------------------
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: HTTP/1.1 200 OK
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Length: 1453
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Type: text/xml
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Date: Thu, 05 Jan 2023 02:38:50 GMT
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amzn-Requestid: cd254177-88c9-40a8-a445-3bb0ce9ca0d0
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: -----------------------------------------------------
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] <DescribeStackSetOperationResponse xmlns="http://internal.amazon.com/coral/com.amazonaws.maestro.service.v20160713/">
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   <DescribeStackSetOperationResult>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     <StackSetOperation>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <ExecutionRoleName>stacksets-exec-03b1857eb5e26953be1206f6d3a12523</ExecutionRoleName>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <AdministrationRoleARN>arn:aws:iam::[REDACTED]:role/aws-service-role/stacksets.cloudformation.amazonaws.com/AWSServiceRoleForCloudFormationStackSetsOrgAdmin</AdministrationRoleARN>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <StackSetId>redacted-iam-role-and-policy-member-accounts:fa6e06a0-2424-46a7-8ba2-c34e48409fb7</StackSetId>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <OperationPreferences>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <RegionOrder/>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <FailureToleranceCount>0</FailureToleranceCount>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <MaxConcurrentCount>20</MaxConcurrentCount>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       </OperationPreferences>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <DeploymentTargets>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <Accounts/>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <OrganizationalUnitIds>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:           <member>[REDACTED]</member>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         </OrganizationalUnitIds>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       </DeploymentTargets>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <CreationTimestamp>2023-01-05T01:42:42.507Z</CreationTimestamp>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <OperationId>terraform-20230105014241260100000002</OperationId>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <Action>CREATE</Action>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <StatusDetails>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:         <FailedStackInstancesCount>0</FailedStackInstancesCount>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       </StatusDetails>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:       <Status>RUNNING</Status>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     </StackSetOperation>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   </DescribeStackSetOperationResult>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   <ResponseMetadata>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     <RequestId>cd254177-88c9-40a8-a445-3bb0ce9ca0d0</RequestId>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   </ResponseMetadata>
2023-01-04T21:38:51.217-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: </DescribeStackSetOperationResponse>
2023-01-04T21:38:51.217-0500 [TRACE] provider.terraform-provider-aws_v4.48.0_x5: [TRACE] Waiting 10s before next try
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Retrying Request cloudformation/CreateStackInstances, attempt 25
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Request cloudformation/CreateStackInstances Details:
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: ---[ REQUEST POST-SIGN ]-----------------------------
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: POST / HTTP/1.1
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Host: cloudformation.us-east-1.amazonaws.com
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: User-Agent: APN/1.0 HashiCorp/1.0 Terraform/1.3.7 (+https://www.terraform.io) terraform-provider-aws/4.48.0 (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.162 (go1.19.3; linux; amd64)
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Length: 336
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Authorization: [REDACTED]
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Type: application/x-www-form-urlencoded; charset=utf-8
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amz-Date: 20230105T023853Z
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amz-Security-Token: [REDACTED]
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Accept-Encoding: gzip
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Action=CreateStackInstances&CallAs=SELF&DeploymentTargets.OrganizationalUnitIds.member.1=r-cgwe&OperationId=terraform-20230105014511263100000003&OperationPreferences.FailureToleranceCount=0&OperationPreferences.MaxConcurrentCount=20&Regions.member.1=us-east-1&StackSetName=redacted-iam-role-and-policy-member-accounts&Version=2010-05-15
2023-01-04T21:38:53.218-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: -----------------------------------------------------
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Response cloudformation/CreateStackInstances Details:
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: ---[ RESPONSE ]--------------------------------------
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: HTTP/1.1 409 Conflict
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Length: 511
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Content-Type: text/xml
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: Date: Thu, 05 Jan 2023 02:38:54 GMT
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: X-Amzn-Requestid: 091ed710-28dd-40c5-bb83-22a3e543076d
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: -----------------------------------------------------
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] <ErrorResponse xmlns="http://internal.amazon.com/coral/com.amazonaws.maestro.service.v20160713/">
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   <Error>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     <Type>Sender</Type>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     <Code>OperationInProgressException</Code>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:     <Message>Another Operation on StackSet arn:aws:cloudformation:us-east-1:[REDACTED]:stackset/redacted-iam-role-and-policy-member-accounts:fa6e06a0-2424-46a7-8ba2-c34e48409fb7 is in progress: terraform-20230105014241260100000002</Message>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   </Error>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:   <RequestId>091ed710-28dd-40c5-bb83-22a3e543076d</RequestId>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: </ErrorResponse>
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Validate Response cloudformation/CreateStackInstances failed, attempt 25/25, error OperationInProgressException: Another Operation on StackSet arn:aws:cloudformation:us-east-1:[REDACTED]:stackset/redacted-iam-role-and-policy-member-accounts:fa6e06a0-2424-46a7-8ba2-c34e48409fb7 is in progress: terraform-20230105014241260100000002
2023-01-04T21:38:54.424-0500 [DEBUG] provider.terraform-provider-aws_v4.48.0_x5:        status code: 409, request id: 091ed710-28dd-40c5-bb83-22a3e543076d
2023-01-04T21:38:54.425-0500 [TRACE] provider.terraform-provider-aws_v4.48.0_x5: Called downstream: @caller=github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/resource.go:838 @module=sdk.helper_schema tf_mux_provider=*schema.GRPCProviderServer tf_resource_type=aws_cloudformation_stack_set_instance tf_rpc=ApplyResourceChange tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=34f15b26-474e-deca-baa0-27f349d61762 timestamp=2023-01-04T21:38:54.424-0500
2023-01-04T21:38:54.425-0500 [TRACE] provider.terraform-provider-aws_v4.48.0_x5: Received downstream response: @caller=github.com/hashicorp/terraform-plugin-go@v0.14.2/tfprotov5/internal/tf5serverlogging/downstream_request.go:37 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_duration_ms=3.373207e+06 tf_req_id=34f15b26-474e-deca-baa0-27f349d61762 @module=sdk.proto diagnostic_error_count=1 diagnostic_warning_count=0 tf_proto_version=5.3 tf_resource_type=aws_cloudformation_stack_set_instance tf_rpc=ApplyResourceChange timestamp=2023-01-04T21:38:54.425-0500
2023-01-04T21:38:54.425-0500 [ERROR] provider.terraform-provider-aws_v4.48.0_x5: Response contains error diagnostic: diagnostic_summary="error creating CloudFormation StackSet (redacted-iam-role-and-policy-member-accounts) Instance: OperationInProgressException: Another Operation on StackSet arn:aws:cloudformation:us-east-1:[REDACTED]:stackset/redacted-iam-role-and-policy-member-accounts:fa6e06a0-2424-46a7-8ba2-c34e48409fb7 is in progress: terraform-20230105014241260100000002
        status code: 409, request id: 091ed710-28dd-40c5-bb83-22a3e543076d" tf_rpc=ApplyResourceChange diagnostic_severity=ERROR diagnostic_detail= tf_proto_version=5.3 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=34f15b26-474e-deca-baa0-27f349d61762 tf_resource_type=aws_cloudformation_stack_set_instance @caller=github.com/hashicorp/terraform-plugin-go@v0.14.2/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto timestamp=2023-01-04T21:38:54.425-0500
2023-01-04T21:38:54.425-0500 [TRACE] provider.terraform-provider-aws_v4.48.0_x5: Served request: @module=sdk.proto tf_req_id=34f15b26-474e-deca-baa0-27f349d61762 tf_resource_type=aws_cloudformation_stack_set_instance tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.14.2/tfprotov5/tf5server/server.go:831 tf_proto_version=5.3 tf_provider_addr=registry.terraform.io/hashicorp/aws timestamp=2023-01-04T21:38:54.425-0500
2023-01-04T21:38:54.426-0500 [TRACE] maybeTainted: aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy encountered an error during creation, so it is now marked as tainted
2023-01-04T21:38:54.426-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy
2023-01-04T21:38:54.426-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy
2023-01-04T21:38:54.426-0500 [TRACE] evalApplyProvisioners: aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy is tainted, so skipping provisioning
2023-01-04T21:38:54.426-0500 [TRACE] maybeTainted: aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy was already tainted, so nothing to do
2023-01-04T21:38:54.426-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy
2023-01-04T21:38:54.426-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy
2023-01-04T21:38:54.426-0500 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2023-01-04T21:38:54.426-0500 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 26
2023-01-04T21:38:54.426-0500 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2023-01-04T21:38:54.429-0500 [ERROR] vertex "aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy" error: error creating CloudFormation StackSet (redacted-iam-role-and-policy-member-accounts) Instance: OperationInProgressException: Another Operation on StackSet arn:aws:cloudformation:us-east-1:[REDACTED]:stackset/redacted-iam-role-and-policy-member-accounts:fa6e06a0-2424-46a7-8ba2-c34e48409fb7 is in progress: terraform-20230105014241260100000002
        status code: 409, request id: 091ed710-28dd-40c5-bb83-22a3e543076d
2023-01-04T21:38:54.429-0500 [TRACE] vertex "aws_cloudformation_stack_set_instance.linked_accounts_role_and_policy": visit complete, with errors
2023-01-04T21:38:54.429-0500 [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" errored, so skipping
2023-01-04T21:38:54.429-0500 [TRACE] dag/walk: upstream of "root" errored, so skipping
2023-01-04T21:38:54.429-0500 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2023-01-04T21:38:54.429-0500 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 27
2023-01-04T21:38:54.429-0500 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2023-01-04T21:38:54.432-0500 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2023-01-04T21:38:54.432-0500 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
2023-01-04T21:38:54.433-0500 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-01-04T21:38:54.435-0500 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/4.48.0/linux_amd64/terraform-provider-aws_v4.48.0_x5 pid=1675
2023-01-04T21:38:54.435-0500 [DEBUG] provider: plugin exited

Panic Output

No response

Important Factoids

From the trace log, it looks as though the provider is set to only query for progress 25 times, then retry the operation (which fails because it's still in progress).

The error occurs after around 70 accounts have been successfully applied or 50-60 minutes have passed.

When selecting the AWS OU for the stack set, I'd recommend having about 90 or 100 accounts. The error only shows itself once about 90% of the accounts in my 80 account org have been successfully deployed to. It takes a long time and you probably don't want to run this repeatedly because you're unsure if the deployment happened to complete in time.

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

et304383 commented 1 year ago

I don't know if this is related to my issue, but I'm seeing something similar where I create the stack set instance for the OU (ours has 15 accounts) and while it finishes in a few minutes, the console just keeps polling:

aws_cloudformation_stack_set_instance.<name>: Still creating... [39m41s elapsed]
wtc1230 commented 1 year ago

Hi @reifnir,

I encountered the same issue when trying to deploy the stack set to a large number of accounts. To address this, I modified the max_retries parameter on the AWS provider, which overrides the default number of retries for AWS API calls.

FYI: https://registry.terraform.io/providers/hashicorp/aws/latest/docs#max_retries

michalz-rely commented 7 months ago

Same issue here, after deployment to the bigger ou, terraform is retrying stackset instance creation that is causing error Status (FAILED) Status Reason: Attempt to perform create operation on already existing stack. So first creation attempt is succeeding but in the meantime terraform starts 2nd one. As 2nd attempt fails it marks that instance as tained (which forces it's replacement during next plan). On certain size OU this is actually occurs more frequent - so only way for dealing with it currently is to untain failed instances manually.

michalz-rely commented 5 months ago

Hi @wtc1230 I tried your solution, but this parameter seems to be ignored for me. Also I have found a possible root cause of the issues, which I think should be reverted or parametrizied.

For larger OUs operation still in progress is actually in progress, I have been investigating errors on both Terraform and AWS end. And this is what's happening for me.

End result:

Also interesting First successful operation have a different operation ID, in my example:

So ideally terraform reporting should be bound to the specific operation ID that it creates, at the moment terraform is guessing.