hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.61k stars 8.99k forks source link

Cloudformation Stackset Wait for Apply Changes - Unintended Consequences #12877

Open awagneratzendesk opened 4 years ago

awagneratzendesk commented 4 years ago

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

resource "aws_cloudformation_stack_set" "regional" {
  administration_role_arn = aws_iam_role.AWSCloudFormationStackSetAdministrationRole.arn
  name                    = "regional"
  capabilities            = ["CAPABILITY_NAMED_IAM"]
  template_body           = file("stackset_templates/regional.json")

  lifecycle {
    ignore_changes = [parameters]
  }
}

Debug Output

aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 12m21s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 12m31s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 12m41s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 12m51s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m1s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m11s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m21s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m31s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m41s elapsed]
aws_cloudformation_stack_set.regional: Still modifying... [id=regional, 13m51s elapsed]

Panic Output

Expected Behavior

In past versions of the AWS Provider, Terraform would upload the stackset change and then exit safely while the stackset rolled out the change.

Actual Behavior

Following https://github.com/terraform-providers/terraform-provider-aws/pull/11726, Terraform now waits with a timeout for the Stackset operation to complete.

While this is expected behavior as part of the above PR, the outcome is not optimal for managing large Stacksets. We have a 350 stack instance stackset that locks up Terraform for a couple of hours with changes this way. We then cannot do other Terraform operations in this repo while a stackset change propagates. For us, this problem will continue to get worse as we add more accounts to the stackset.

This behavior also increases the risk of network drops or other issues affecting a Terraform apply. An option to not wait for the stackset operation would be much appreciated. I suspect this was not the intended outcome from this change but using stacksets at scale makes this a difficult problem to solve with waiting.

Important Factoids

References

benfreke-qq commented 3 years ago

We have the same problem. Managing stacksets at scale becomes very difficult with this approach.

It would be great to have an option to skip waiting for all the stacks to update.

Our pipeline assumes roles in other accounts to perform operations, and the maximum session length on a cross-account assume role is 3600 seconds, which prevents this from ever exiting gracefully for larger stacksets. We've got some stacksets with upwards of 1k instances. This also results in pipeline failures when, in actual fact, the deployment has been successful.

@awagneratzendesk did you make any progress in a workaround?

justinretzolk commented 2 years ago

Hey @awagneratzendesk 👋 Thank you for taking the time to file this. Given that there's been a number of AWS Provider releases since you initially filed it, can you confirm whether you're still experiencing this behavior?

awagneratzendesk commented 2 years ago

@justinretzolk I just took a look and we're still seeing the issue but we're on 3.36.0 for the AWS provider. I'll try a test of this on latest release and validate.

michael-ullrich-1010 commented 1 year ago

I have a similar issue. I want to update the stackset template via terraform and avoid triggering an operation. If triggering an operation cannot be avoided, I would like to have the option, to allow terraform not to wait until the operation is finished.

michalz-rely commented 9 months ago

Same here, even updating a stackset leads to a very long time wait - I'd be ok to exit it without waiting, than next plan would show me if everything went fine

hakuno commented 1 month ago

What's the size of your template?

michalz-rely commented 2 weeks ago

Not huge, certain resources defined with cloudformation takes unusually long time to deploy, a simple eventbridge rule can take up to 10 minutes to deploy.