terraform can't destroy auto scaling groups that have the 'Terminate' scaling process disabled

ghost commented 6 years ago

This issue was originally opened by @TTEA1990 as hashicorp/terraform#18502. It was migrated here as a result of the provider split. The original body of the issue is below.

Terraform Version

Terraform v0.11.7

Terraform Configuration Files

...

Debug Output

Crash Output

aws_autoscaling_group.fes_asg: Destroying... (ID: SNDLFWFESASG001)
aws_autoscaling_group.bes_asg: Still destroying... (ID: SNDLFWFESASG001, 10s elapsed)
...
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 7m50s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m0s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m10s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m20s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m30s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m40s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 8m50s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m0s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m10s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m20s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m30s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m40s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 9m50s elapsed)
aws_autoscaling_group.fes_asg: Still destroying... (ID: SNDLFWFESASG001, 10m0s elapsed)
Releasing state lock. This may take a few moments...

Error: Error applying plan:

1 error(s) occurred:

* aws_autoscaling_group.fes_asg (destroy): 1 error(s) occurred:

* aws_autoscaling_group.fes_asg: group still has 2 instances

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Expected Behavior

Terraform sest Min/max/desired instances to 0 but the associated EC2 instances are not terminated. Terraform waits for resorce to be destroyed until it hits is 10 minute timeout

Steps to Reproduce

Create aws_launch_configuration and aws_autoscaling_group resource with the following attributes


resource "aws_launch_configuration" "test_lc" {
name_prefix   = "terraform-lc-example-"
image_id      = "<YOUR_AMI_ID>"
instance_type = "t2.micro"

lifecycle {
create_before_destroy = true
}
}

resource "aws_autoscaling_group" "fes_asg" {

name = "TEST-ASG-001" min_size = "2" max_size = "2" desired_capacity = "2" launch_configuration = "${aws_launch_configuration.test_lc.name" suspended_processes = ["Launch", "Terminate", "HealthCheck", "ReplaceUnhealthy", "AZRebalance", "AlarmNotification", "ScheduledActions", "AddToLoadBalancer"] }


2. Run `terraform apply` and wait for EC2 instances to be provisioned.

3. Run `terraform destroy`

### Additional Context
<!--
Are there anything atypical about your situation that we should know? For example: is Terraform running in a wrapper script or in a CI system? Are you passing any unusual command line options or environment variables to opt-in to non-default behavior?
-->
When you provision an autoscaling group (ASG) and disable the **terminate** process Terraform cannot destroy the ASG. It sets the min/max/desired instance count to 0 but the ASG will not automatically terminate the associated instances.
### References
<!--
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:

- hashicorp/terraform#6017

-->

virgofx commented 4 years ago

:+1: Any news on this ? Curious if anyone has any workaround? It's really cumbersome to have to manually login to the UI, remove suspend policies, and then re-destroy.

bforbis commented 4 years ago

Just chiming in that I am also running into this issue. My organization uses Cloud Custodian to save on AWS costs. One of these policies will suspend our ASGs during the off hours. This means that if someone runs terraform destroy during off_hours, it will fail to complete.

Presumably a workaround is to use force_delete in the ASG terraform resource definition, but that would leave dangling stopped EC2 instances that will get restarted by Cloud Custodian when it's no longer in off_hours and it is ready to resume resources.

parthibd commented 3 years ago

Did you try increasing the timeout for the ASG when it is destroyed , something like

resource "aws_db_instance" "example" {
  # ...

  timeouts {
    create = "60m"
    delete = "2h"
  }
}

parthibd commented 3 years ago

In my case the ASG takes more than 10 mins to get destroyed , so it times out since the default value is 10m . I will suggest giving the timeout option on aws_autoscaling_group a spin 🙃

carlosmedinas commented 3 years ago

make sure the the iam_instance_profile referenced on the Launch template have the necessary policies for autoscaling.

e.g. autoscaling:*

hashicorp / terraform-provider-aws