ak2766 commented 1 month ago

Terraform Core Version

1.9.4

AWS Provider Version

5.61.0

Affected Resource(s)

aws_instance

Expected Behavior

I expected that if terraform plan reports no errors, that terraform apply would make the necessary infrastructure changes without erroring out on something the planning phase should have identified as a concern.

As seen below, the planning phase identifies no issue:

$ > terraform plan
aws_security_group.sgtest: Refreshing state... [id=sg-031c6afb019753597]
aws_security_group_rule.allow_inbound_ssh: Refreshing state... [id=sgrule-118090526]
aws_instance.vm1: Refreshing state... [id=i-0c82b4598ff94c56f]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-0c82b4598ff94c56f"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-031c6afb019753597",
          + "Bad SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Actual Behavior

I instead get the following error in the apply phase:

$ > terraform apply -auto-approve
aws_security_group.sgtest: Refreshing state... [id=sg-031c6afb019753597]
aws_security_group_rule.allow_inbound_ssh: Refreshing state... [id=sgrule-118090526]
aws_instance.vm1: Refreshing state... [id=i-0c82b4598ff94c56f]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-0c82b4598ff94c56f"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-031c6afb019753597",
          + "Bad SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
aws_instance.vm1: Modifying... [id=i-0c82b4598ff94c56f]
╷
│ Error: updating EC2 Instance (i-0c82b4598ff94c56f): modifying network interface: operation error EC2: ModifyNetworkInterfaceAttribute, https response error StatusCode: 400, RequestID: 5802e99e-520c-4939-a90b-3ca6ce61ddee, api error InvalidGroup.NotFound: The security group 'bad sg name' does not exist
│ 
│   with aws_instance.vm1,
│   on main.tf line 27, in resource "aws_instance" "vm1":
│   27: resource "aws_instance" "vm1" {
│ 
╵

However, as seen in the screenshot below, the security group actually exists: terraform-security-group

Relevant Error/Panic Output Snippet

Plan: 0 to add, 1 to change, 0 to destroy.
aws_instance.vm1: Modifying... [id=i-0c82b4598ff94c56f]
╷
│ Error: updating EC2 Instance (i-0c82b4598ff94c56f): modifying network interface: operation error EC2: ModifyNetworkInterfaceAttribute, https response error StatusCode: 400, RequestID: 5802e99e-520c-4939-a90b-3ca6ce61ddee, api error InvalidGroup.NotFound: The security group 'bad sg name' does not exist
│ 
│   with aws_instance.vm1,
│   on main.tf line 27, in resource "aws_instance" "vm1":
│   27: resource "aws_instance" "vm1" {
│ 
╵

Terraform Configuration Files

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

resource "aws_security_group" "sgtest" {
  name = "Bad SG Name"
}

resource "aws_security_group_rule" "allow_inbound_ssh" {
  type              = "ingress"
  security_group_id = aws_security_group.sgtest.id
  from_port     = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["1.2.3.4/32"]
}

resource "aws_instance" "vm1" {
  ami = "ami-0862be96e41dcbf74" # ubuntu 24.04 LTS // us-east-2
  instance_type = "t3a.nano"
  availability_zone = "us-east-2a"
  security_groups = [aws_security_group.sgtest.name]
  tags = {
    Name = "vm1"
  }
}

Steps to Reproduce

After deploying the configuration above, in the aws_instance resource, change security_groups to vpc_security_group_ids and apply this new configuration:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

resource "aws_security_group" "sgtest" {
  name = "Bad SG Name"
}

resource "aws_security_group_rule" "allow_inbound_ssh" {
  type              = "ingress"
  security_group_id = aws_security_group.sgtest.id
  from_port     = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["1.2.3.4/32"]
}

resource "aws_instance" "vm1" {
  ami = "ami-0862be96e41dcbf74" # ubuntu 24.04 LTS // us-east-2
  instance_type = "t3a.nano"
  availability_zone = "us-east-2a"
  vpc_security_group_ids = [aws_security_group.sgtest.name]
  tags = {
    Name = "vm1"
  }
}

Debug Output

debug.log

Panic Output

No response

Important Factoids

The reason behind making this change was reading this article: https://www.reddit.com/r/Terraform/comments/uh81r3/tf_wants_to_destroy_and_recreate_a_vm_just_for/

Article Summary: TF wants to destroy and recreate a VM just for changing security group

After reading this article, I decided to experiment and see what is/isn't possible and stumbled onto this issue. I was previously using AWS Provider version 3.76.1 where it attempted to make the change but failed after 15 minutes with a similar error as 5.61.1. Good thing with v5.61.1 is that it fails early but fails nonetheless.

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

ak2766 commented 1 month ago

Apologies. Looks like I forgot to redirect std_err to my debug.log file. Please reference this debug log instead. debug.log

ak2766 commented 1 month ago

Please note that destroying and then applying the configuration with vpc_security_group_ids works just fine:

NOTE: I've removed all lines where it had (known after apply) to make it brief

$ > terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.vm1 will be created
  + resource "aws_instance" "vm1" {
      + ami                                  = "ami-0862be96e41dcbf74"
      + availability_zone                    = "us-east-2a"
      + get_password_data                    = false
      + instance_type                        = "t3a.nano"
      + source_dest_check                    = true
      + tags                                 = {
          + "Name" = "vm1"
        }
      + tags_all                             = {
          + "Name" = "vm1"
        }
      + user_data_replace_on_change          = false
      + vpc_security_group_ids               = [
          + "Bad SG Name",
        ]
    }

  # aws_security_group.sgtest will be created
  + resource "aws_security_group" "sgtest" {
      + description            = "Managed by Terraform"
      + name                   = "Bad SG Name"
      + revoke_rules_on_delete = false
    }

  # aws_security_group_rule.allow_inbound_ssh will be created
  + resource "aws_security_group_rule" "allow_inbound_ssh" {
      + cidr_blocks              = [
          + "1.2.3.4/32",
        ]
      + from_port                = 22
      + protocol                 = "tcp"
      + self                     = false
      + to_port                  = 22
      + type                     = "ingress"
    }

Plan: 3 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_security_group.sgtest: Creating...
aws_security_group.sgtest: Creation complete after 4s [id=sg-09292e183fd729da1]
aws_security_group_rule.allow_inbound_ssh: Creating...
aws_instance.vm1: Creating...
aws_security_group_rule.allow_inbound_ssh: Creation complete after 1s [id=sgrule-2784389113]
aws_instance.vm1: Still creating... [10s elapsed]
aws_instance.vm1: Creation complete after 15s [id=i-041bcc08757116c2a]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

ak2766 commented 1 month ago

Interestingly, after deploying that last manifest, I then tried changing the name of the security_group from BAD SG Name to Better SG Name and although terraform plan said all was good, terraform apply failed after 15 minutes with the following error:

╷
│ Error: deleting Security Group (sg-09292e183fd729da1): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 5d79a017-7226-49d6-8860-a3593d6ea396, api error DependencyViolation: resource sg-09292e183fd729da1 has a dependent object
│ 
│

The terraform plan output looked like below:

NOTE: I've removed the (known after apply) lines to keep it brief

$ > terraform plan
aws_security_group.sgtest: Refreshing state... [id=sg-09292e183fd729da1]
aws_instance.vm1: Refreshing state... [id=i-041bcc08757116c2a]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
  + create
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_instance.vm1 will be updated in-place
  ~ resource "aws_instance" "vm1" {
        id                                   = "i-041bcc08757116c2a"
        tags                                 = {
            "Name" = "vm1"
        }
      ~ vpc_security_group_ids               = [
          - "sg-09292e183fd729da1",
          + "Better SG Name",
        ]
        # (38 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # aws_security_group.sgtest must be replaced
-/+ resource "aws_security_group" "sgtest" {
      ~ name                   = "Bad SG Name" -> "Better SG Name" # forces replacement
      - tags                   = {} -> null
        # (2 unchanged attributes hidden)
    }

  # aws_security_group_rule.allow_inbound_ssh will be created
  + resource "aws_security_group_rule" "allow_inbound_ssh" {
      + cidr_blocks              = [
          + "1.2.3.4/32",
        ]
      + from_port                = 22
      + protocol                 = "tcp"
      + self                     = false
      + to_port                  = 22
      + type                     = "ingress"
    }

Plan: 2 to add, 1 to change, 1 to destroy.

EDIT: adding the debug log file:

debug-2.log

ak2766 commented 1 month ago

That last issue appears related to the original so didn't see the point in creating a new bug. Let me know if not related and I can push that last one to a bug item on it's own.

hashicorp / terraform-provider-aws

TF plan says all is good - TF apply craps out with an error #38762

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

Community Note