hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

TF plan says all is good - TF apply craps out with an error #38762

Open ak2766 opened 1 month ago

ak2766 commented 1 month ago

Terraform Core Version

1.9.4

AWS Provider Version

5.61.0

Affected Resource(s)

aws_instance

Expected Behavior

I expected that if terraform plan reports no errors, that terraform apply would make the necessary infrastructure changes without erroring out on something the planning phase should have identified as a concern.

As seen below, the planning phase identifies no issue:

$ > terraform plan
aws_security_group.sgtest: Refreshing state... [id=sg-031c6afb019753597]
aws_security_group_rule.allow_inbound_ssh: Refreshing state... [id=sgrule-118090526]
aws_instance.vm1: Refreshing state... [id=i-0c82b4598ff94c56f]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-0c82b4598ff94c56f"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-031c6afb019753597",
          + "Bad SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Actual Behavior

I instead get the following error in the apply phase:

$ > terraform apply -auto-approve
aws_security_group.sgtest: Refreshing state... [id=sg-031c6afb019753597]
aws_security_group_rule.allow_inbound_ssh: Refreshing state... [id=sgrule-118090526]
aws_instance.vm1: Refreshing state... [id=i-0c82b4598ff94c56f]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-0c82b4598ff94c56f"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-031c6afb019753597",
          + "Bad SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
aws_instance.vm1: Modifying... [id=i-0c82b4598ff94c56f]
╷
│ Error: updating EC2 Instance (i-0c82b4598ff94c56f): modifying network interface: operation error EC2: ModifyNetworkInterfaceAttribute, https response error StatusCode: 400, RequestID: 5802e99e-520c-4939-a90b-3ca6ce61ddee, api error InvalidGroup.NotFound: The security group 'bad sg name' does not exist
│ 
│   with aws_instance.vm1,
│   on main.tf line 27, in resource "aws_instance" "vm1":
│   27: resource "aws_instance" "vm1" {
│ 
╵

However, as seen in the screenshot below, the security group actually exists: terraform-security-group

Relevant Error/Panic Output Snippet

Plan: 0 to add, 1 to change, 0 to destroy.
aws_instance.vm1: Modifying... [id=i-0c82b4598ff94c56f]
╷
│ Error: updating EC2 Instance (i-0c82b4598ff94c56f): modifying network interface: operation error EC2: ModifyNetworkInterfaceAttribute, https response error StatusCode: 400, RequestID: 5802e99e-520c-4939-a90b-3ca6ce61ddee, api error InvalidGroup.NotFound: The security group 'bad sg name' does not exist
│ 
│   with aws_instance.vm1,
│   on main.tf line 27, in resource "aws_instance" "vm1":
│   27: resource "aws_instance" "vm1" {
│ 
╵

Terraform Configuration Files

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

resource "aws_security_group" "sgtest" {
  name = "Bad SG Name"
}

resource "aws_security_group_rule" "allow_inbound_ssh" {
  type              = "ingress"
  security_group_id = aws_security_group.sgtest.id
  from_port     = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["1.2.3.4/32"]
}

resource "aws_instance" "vm1" {
  ami = "ami-0862be96e41dcbf74" # ubuntu 24.04 LTS // us-east-2
  instance_type = "t3a.nano"
  availability_zone = "us-east-2a"
  security_groups = [aws_security_group.sgtest.name]
  tags = {
    Name = "vm1"
  }
}

Steps to Reproduce

After deploying the configuration above, in the aws_instance resource, change security_groups to vpc_security_group_ids and apply this new configuration:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

resource "aws_security_group" "sgtest" {
  name = "Bad SG Name"
}

resource "aws_security_group_rule" "allow_inbound_ssh" {
  type              = "ingress"
  security_group_id = aws_security_group.sgtest.id
  from_port     = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["1.2.3.4/32"]
}

resource "aws_instance" "vm1" {
  ami = "ami-0862be96e41dcbf74" # ubuntu 24.04 LTS // us-east-2
  instance_type = "t3a.nano"
  availability_zone = "us-east-2a"
  vpc_security_group_ids = [aws_security_group.sgtest.name]
  tags = {
    Name = "vm1"
  }
}

Debug Output

debug.log

Panic Output

No response

Important Factoids

The reason behind making this change was reading this article: https://www.reddit.com/r/Terraform/comments/uh81r3/tf_wants_to_destroy_and_recreate_a_vm_just_for/

Article Summary: TF wants to destroy and recreate a VM just for changing security group

After reading this article, I decided to experiment and see what is/isn't possible and stumbled onto this issue. I was previously using AWS Provider version 3.76.1 where it attempted to make the change but failed after 15 minutes with a similar error as 5.61.1. Good thing with v5.61.1 is that it fails early but fails nonetheless.

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

ak2766 commented 1 month ago

Apologies. Looks like I forgot to redirect std_err to my debug.log file. Please reference this debug log instead. debug.log

ak2766 commented 1 month ago

Please note that destroying and then applying the configuration with vpc_security_group_ids works just fine:

NOTE: I've removed all lines where it had (known after apply) to make it brief

$ > terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.vm1 will be created
  + resource "aws_instance" "vm1" {
      + ami                                  = "ami-0862be96e41dcbf74"
      + availability_zone                    = "us-east-2a"
      + get_password_data                    = false
      + instance_type                        = "t3a.nano"
      + source_dest_check                    = true
      + tags                                 = {
          + "Name" = "vm1"
        }
      + tags_all                             = {
          + "Name" = "vm1"
        }
      + user_data_replace_on_change          = false
      + vpc_security_group_ids               = [
          + "Bad SG Name",
        ]
    }

  # aws_security_group.sgtest will be created
  + resource "aws_security_group" "sgtest" {
      + description            = "Managed by Terraform"
      + name                   = "Bad SG Name"
      + revoke_rules_on_delete = false
    }

  # aws_security_group_rule.allow_inbound_ssh will be created
  + resource "aws_security_group_rule" "allow_inbound_ssh" {
      + cidr_blocks              = [
          + "1.2.3.4/32",
        ]
      + from_port                = 22
      + protocol                 = "tcp"
      + self                     = false
      + to_port                  = 22
      + type                     = "ingress"
    }

Plan: 3 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_security_group.sgtest: Creating...
aws_security_group.sgtest: Creation complete after 4s [id=sg-09292e183fd729da1]
aws_security_group_rule.allow_inbound_ssh: Creating...
aws_instance.vm1: Creating...
aws_security_group_rule.allow_inbound_ssh: Creation complete after 1s [id=sgrule-2784389113]
aws_instance.vm1: Still creating... [10s elapsed]
aws_instance.vm1: Creation complete after 15s [id=i-041bcc08757116c2a]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
ak2766 commented 1 month ago

Interestingly, after deploying that last manifest, I then tried changing the name of the security_group from BAD SG Name to Better SG Name and although terraform plan said all was good, terraform apply failed after 15 minutes with the following error:

╷
│ Error: deleting Security Group (sg-09292e183fd729da1): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 5d79a017-7226-49d6-8860-a3593d6ea396, api error DependencyViolation: resource sg-09292e183fd729da1 has a dependent object
│ 
│ 

The terraform plan output looked like below:

NOTE: I've removed the (known after apply) lines to keep it brief

$ > terraform plan
aws_security_group.sgtest: Refreshing state... [id=sg-09292e183fd729da1]
aws_instance.vm1: Refreshing state... [id=i-041bcc08757116c2a]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  + create
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-041bcc08757116c2a"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-09292e183fd729da1",
          + "Better SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # aws_security_group.sgtest must be replaced
-/+ resource "aws_security_group" "sgtest" {
      ~ name                   = "Bad SG Name" -> "Better SG Name" # forces replacement
      - tags                   = {} -> null
        # (2 unchanged attributes hidden)
    }

  # aws_security_group_rule.allow_inbound_ssh will be created
  + resource "aws_security_group_rule" "allow_inbound_ssh" {
      + cidr_blocks              = [
          + "1.2.3.4/32",
        ]
      + from_port                = 22
      + protocol                 = "tcp"
      + self                     = false
      + to_port                  = 22
      + type                     = "ingress"
    }

Plan: 2 to add, 1 to change, 1 to destroy.

EDIT: adding the debug log file:

debug-2.log

ak2766 commented 1 month ago

That last issue appears related to the original so didn't see the point in creating a new bug. Let me know if not related and I can push that last one to a bug item on it's own.