hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

[Bug]: aws_vpc_peering_connection creation intermittently fails with "Unable to modify ... Connection .. is ... pending-acceptance" error #28705

Open camlow325 opened 1 year ago

camlow325 commented 1 year ago

Terraform Core Version

1.0.8

AWS Provider Version

4.17.1

Affected Resource(s)

Expected Behavior

Apply with resource successful.

Actual Behavior

Intermittently, Terraform apply fails with an error like the following:

Error: Unable to modify EC2 VPC Peering Connection Options. EC2 VPC Peering Connection (pcx-*) is not active (current status: pending-acceptance). Please set the `auto_accept` attribute to `true` or activate the EC2 VPC Peering Connection manually.

  with aws_vpc_peering_connection.this,
  on main.tf line 1, in resource "aws_vpc_peering_connection" "this":
  1: resource "aws_vpc_peering_connection" "this" 

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_vpc_peering_connection" "this" {
  auto_accept = true
  peer_vpc_id = ...
  vpc_id      = ...

  accepter {
    allow_remote_vpc_dns_resolution = false
  }

  requester {
    allow_remote_vpc_dns_resolution = false
  }

  tags = {}
}

Steps to Reproduce

terraform apply.

Debug Output

For a couple of cases of this failure so far, we've seen the following behavior:

  1. The AcceptVpcPeeringConnection call to AWS appears to be successful.
  2. The status returned in the subsequent DescribeVpcPeeringConnections call is still pending-acceptance.
  3. Before performing a ModifyVpcPeeringConnectionOptions call to AWS, the provider returns the error because the status for the connection was not active here.

At step 2 above, in the WaitVPCPeeringConnectionActive call, the call stops the wait loop here because the current status, pending-acceptance, is one of the Target values, Active and PendingAcceptance. I think if this logic could be modified to only allow a status of Active to terminate the loop, the code would wait until AWS eventually transitions the state to active before modifying connection options and that the error would, therefore, be avoided.

After the Terraform run has completed with the failure, we have seen that the status of the connection in the AWS console does indeed transition to active automatically even though the Terraform provider does not wait for this to occur before erroring out.

Debug snippets:

2023-01-05T04:03:28.459Z [DEBUG] provider.terraform-provider-aws_v4.17.1_x5: [aws-sdk-go] DEBUG: Request ec2/AcceptVpcPeeringConnection Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: ec2.us-east-1.amazonaws.com
...
Action=AcceptVpcPeeringConnection&Version=2016-11-15&VpcPeeringConnectionId=pcx-...
...
2023-01-05T04:03:28.801Z [DEBUG] provider.terraform-provider-aws_v4.17.1_x5: [aws-sdk-go] DEBUG: Response ec2/AcceptVpcPeeringConnection Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
...
<AcceptVpcPeeringConnectionResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/">
...
        <status>
            <code>provisioning</code>
            <message>Provisioning</message>
        </status>
...
2023-01-05T04:03:28.801Z [DEBUG] provider.terraform-provider-aws_v4.17.1_x5: [aws-sdk-go] DEBUG: Request ec2/DescribeVpcPeeringConnections Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: ec2.us-east-1.amazonaws.com
...
Action=DescribeVpcPeeringConnections&Version=2016-11-15&VpcPeeringConnectionId.1=pcx-...
...
2023-01-05T04:03:28.916Z [DEBUG] provider.terraform-provider-aws_v4.17.1_x5: [aws-sdk-go] DEBUG: Response ec2/DescribeVpcPeeringConnections Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
...
<DescribeVpcPeeringConnectionsResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/">
 ...
            <status>
                <code>pending-acceptance</code>

Panic Output

No response

Important Factoids

No response

References

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

metronlab-it commented 1 year ago

The workaround for this is to split the allow_remote_vpc_dns_resolution out into a separate resource, with a depends_on. That way terraform will wait until the peering connection is active before it tries to modify it to add the DNS resolution. So:

resource "aws_vpc_peering_connection" "this" {
  auto_accept = true
  peer_vpc_id = ...
  vpc_id      = ...
  tags = {}
}

resource "aws_vpc_peering_connection_options" "accept_dns" {
  vpc_peering_connection_id = aws_vpc_peering_connection.this.id

  requester {
    allow_remote_vpc_dns_resolution = true
  }
  depends_on = [
    aws_vpc_peering_connection.this
  ]
}

Also, if you're creating the peering across multiple regions, is better to have the allow_remote_vpc_dns_resolution for the accepter in the accepter block.