hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.71k stars 9.07k forks source link

[Bug]: aws_vpc_security_group_(in/e)gress_rule fails on update when using peer security group #36958

Open greyhill-xplor opened 4 months ago

greyhill-xplor commented 4 months ago

Terraform Core Version

1.6.1

AWS Provider Version

5.45.0

Affected Resource(s)

Expected Behavior

The security group rule is updated with the arguments given in the aws_vpc_security_group_ingress_rule and aws_vpc_security_group_egress_rule resources.

Actual Behavior

An error message is returned:

Error: updating VPC Security Group Rule (sgr-XXXXXXXXXX)

with aws_vpc_security_group_ingress_rule.XXXXXXXXX, on XXXXX.tf line 63, in resource "aws_vpc_security_group_ingress_rule" "XXXXXXXXX": 63: resource "aws_vpc_security_group_ingress_rule" "XXXXXXXXX" {

InvalidGroupId.Malformed: Invalid id: "123456789/sg-XXXXXXXXX" status code: 400, request id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

I can't test this as I'm working with a company codebase. The following code might need some adjustment for others, but the main outline is this:

provider "aws" {
  alias   = "source"
  profile = "sandbox"
}

provider "aws" {
  alias   = "destination"
  profile = "production"
}

resource "aws_vpc" "source" {
  provider = aws.source
}

resource "aws_vpc" "destination" {
  provider = aws.destination
}

resource "aws_security_group" "source" {
  vpc_id = aws_vpc.source

  name_prefix = "source"

  provider = aws.source
}

resource "aws_security_group" "destination" {
  vpc_id = aws_vpc.source

  name_prefix = "source"

  provider = aws.destination
}

data "aws_caller_identity" "destination" {
  provider = aws.destination
}

resource "aws_vpc_security_group_egress_rule" "source" {
  security_group_id = aws_security_group.source.id

  description = "To change"

  ip_protocol = "tcp"
  from_port   = 443
  to_port     = 443

  referenced_security_group_id = "${data.aws_caller_identity.destination.account_id}/${aws_security_group.destination.id}"

  provider = aws.source
}

Steps to Reproduce

  1. terraform init
  2. terraform apply with "yes"
  3. Change the description of the "egress" rule to "Changed!"
  4. terraform apply with "yes" - this should fail

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 4 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

greyhill-xplor commented 4 months ago

There are some workarounds to this problem:

jbbeal commented 2 months ago

There are some workarounds to this problem:

  • manually delete the security group rule and allow Terraform to create the rule instead of updating it
  • use aws_security_group_rule instead

Those aren't always valid alternatives, though -- manually deleting the security group means temporarily breaking your network; if you have a large terraform workspace, it can take 5-10 minutes to run a plan and apply, which would be a long production outage. As for using aws_security_group_rule, terraform has a big warning on all of the security group pages that the newer vpc_security_group_ingress_rule should be used for "all new rules" (and has been the advice for about a year). From the warning, it seems like mixing-and-matching the two rule formats can have unpredictable results, so we're trying to refactor everything to use the newer resource, but ran into this bug :(

jbbeal commented 2 months ago

FWIW, I was able to work around the error by removing the account ID. AWS is able to resolve the security group ID across accounts, and the rule is created successfully. (I validated using the AWS console also, that setting just the source security group works, and that then describe-security-group-rule shows that the source security group rule is owned by the other account.)

What's happening now, though, is that when the provider calls DescribeSecurityGroup during the plan stage, the provider seems to be resolving the account ID as it did before, so it "sees" the value of the source_security_group_id field as ${account_id}/${source_security_group_id}, and so the plan always generates a diff. This seems like the solution in the provider could be as simple as NOT concatenating the UserId and GroupId properties of ReferencedGroupInfo in the output of DescribeSecurityGroups into terraform's source_security_group_id property

jbbeal commented 2 months ago

This line right here