terraform-aws-modules / terraform-aws-autoscaling

Terraform module to create AWS Auto Scaling resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/autoscaling/aws
Apache License 2.0
287 stars 552 forks source link

Replacing an ASG with a network interface ID fails with interface in use error #226

Closed mhowell-ims closed 1 year ago

mhowell-ims commented 1 year ago

Description

According to the AWS docs, you can configure an ASG with a network interface ID for device zero. In this case, the ASG can only create a single EC2 instance, and any instance created by the ASG will always be attached to the given ENI. I.e. if the instance gets terminated, the replacement instance will still attach to the same ENI.

This works fine with this module, except for one thing. If for any reason the aws_autoscaling_group created by this module needs to be replaced, the following error occurs:

Error: waiting for Auto Scaling Group (ASG1-1234567890) capacity satisfied: 1 error occurred:
│       * Scaling activity (1234567890): Failed: Interface: [eni-1234567890] in use. Launching EC2 instance failed.

I'm fairly certain this is caused by the fact that the aws_autoscaling_group resource created by the module has create_before_destroy enabled. With that setting in place, when the ASG gets replaced, the new ASG is created before the old one gets deleted. When that happens, the new ASG tries to create an instance, which tries to attach to the ENI, which is still attached to the old instance from the original ASG, thus producing the error above.

So the fix for this might involve adding an option to control the create_before_destroy setting on the aws_autoscaling_group resource. Or use the presence of a network_interface_id in the network_interfaces to determine how to set the create_before_destroy.

Versions

Reproduction Code [Required]

terraform {
  required_version = "= 1.3.7"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "= 4.58.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

locals {
  private_subnet_cidr_block = "10.156.70.0/23"
  availability_zone = "us-east-1a"
}

resource "aws_vpc" "vpc" {
  cidr_block = "10.156.64.0/20"

  instance_tenancy     = "default"
  enable_dns_support   = true
  enable_dns_hostnames = true
}

resource "aws_subnet" "private_subnet" {
  vpc_id                  = aws_vpc.vpc.id
  cidr_block              = local.private_subnet_cidr_block
  availability_zone       = local.availability_zone
  map_public_ip_on_launch = false
}

resource "aws_network_interface" "eni" {
  subnet_id       = aws_subnet.private_subnet.id
}

data "aws_ssm_parameter" "latest_ubuntu_ami" {  
   name = "/aws/service/canonical/ubuntu/server/jammy/stable/current/amd64/hvm/ebs-gp2/ami-id"
}

module "autoscaling" {
  source  = "terraform-aws-modules/autoscaling/aws"
  version = "6.9.0"

  name = "ASG"

  instance_type = "t3.medium"
  image_id = data.aws_ssm_parameter.latest_ubuntu_ami.value

  min_size = 1
  max_size = 1
  desired_capacity = 1

  availability_zones = [local.availability_zone]

  network_interfaces = [
    {
      device_index          = 0
      network_interface_id  = aws_network_interface.eni.id
    }
  ]

}

Steps to reproduce the behavior:

  1. Apply the terraform above to an AWS account.
  2. Make sure the ASG creates the instance and that the instance starts up completely.
  3. Add the following to the 'autoscaling' module:
    initial_lifecycle_hooks = [
    {
      name                  = "StartupLifeCycleHook"
      default_result        = "CONTINUE"
      heartbeat_timeout     = 60
      lifecycle_transition  = "autoscaling:EC2_INSTANCE_LAUNCHING"
    }
    ]
  4. Apply the terraform again. The code added in step 2 above triggers a replacement of the ASG created in step 1.
  5. You get the error about the ENI being in use and the instance failing to create.

Are you using workspaces? NO Have you cleared the local cache (see Notice section above)? YES

Expected behavior

Replacing the ASG should not result in the error above. Setting create_before_destroy to false may address the issue.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

mhowell-ims commented 1 year ago

This is still an issue.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.