hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.8k stars 9.15k forks source link

Auto-Accept GWLB Endpoints Fail to Create ("Error Waiting for VPC Endpoint") #20481

Open SensitiveSKIN opened 3 years ago

SensitiveSKIN commented 3 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform v1.0.4
on darwin_amd64
$

Affected Resource(s)

aws_vpc_endpoint

Terraform Configuration Files

#-------------------------------------------------------------------------
#Pull current account ARN of caller. Used to securely create GWLB.
data "aws_caller_identity" "current" {
}

#-------------------------------------------------------------------------
#Identify available Availability-Zones at time of execution.
data "aws_availability_zones" "available" {
  state = "available"
}

#-------------------------------------------------------------------------
#Create Appliance VPC.

resource "aws_vpc" "secvpc" {
   cidr_block = 10.0.0.0/24
   instance_tenancy = "default"
   enable_dns_support = true
   enable_dns_hostnames = true
   tags = {
      Name = "TEST-APPLIANCE-VPC"
   }
}

#-------------------------------------------------------------------------
#Create Appliance Subnets.

resource "aws_subnet" "gwlb_subnet1" {
   vpc_id      = aws_vpc.secvpc.id
   cidr_block  = "10.0.0.0/28"
   availability_zone = data.aws_availability_zone.available[0]
   tags = {
      "Name" = "TEST-GWLB-SUBNET-1"
   }
}

resource "aws_subnet" "gwlb_subnet2" {
   vpc_id      = aws_vpc.secvpc.id
   cidr_block  = "10.0.0.16/28"
   availability_zone = data.aws_availability_zone.available[1]
   tags = {
      "Name" = "TEST-GWLB-SUBNET-2"
   }
}

resource "aws_subnet" "gwlbe_subnet1" {
   vpc_id      = aws_vpc.secvpc.id
   cidr_block  = "10.0.0.32/28"
   availability_zone = data.aws_availability_zone.available[0]
   tags = {
      "Name" = "TEST-GWLB-ENDPOINT-SUBNET-1"
   }
}

resource "aws_subnet" "gwlbe_subnet2" {
   vpc_id      = aws_vpc.secvpc.id
   cidr_block  = "10.0.0.48/28"
   availability_zone = data.aws_availability_zone.available[1]
   tags = {
      "Name" = "TEST-GWLB-ENDPOINT-SUBNET-2"
   }
}

#-------------------------------------------------------------------------
#Create Gateway Load-Balancer (GWLB) and GWLB-Endpoint Service.

resource "aws_lb" "gwlb" {
  name = "TEST-GWLB"
  load_balancer_type = "gateway"
  subnets = tolist([aws_subnet.gwlb_subnet1.id, aws_subnet.gwlb_subnet2.id])
  enable_deletion_protection = false
  enable_cross_zone_load_balancing = true

  tags = {
    Name = "TEST-GWLB"
  }
}

resource "aws_vpc_endpoint_service" "gwlb-service" {
  acceptance_required = true
  allowed_principals = [ data.aws_caller_identity.current.arn ]
  gateway_load_balancer_arns = [ aws_lb.gwlb.arn ]

  tags = {
    Name = "TEST-GWLB-SERVICE"
  }
}

#-------------------------------------------------------------------------
#Create Appliance GWLB-Endpoints.

resource "aws_vpc_endpoint" "sec-gwlb-endpoint" {
  count = 2
  vpc_id = aws_vpc.secvpc.id
  subnet_ids = tolist([aws_subnet.gwlbe_subnet1.id, aws_subnet.gwlbe_subnet2.id])
  service_name = aws_vpc_endpoint_service.gwlb-service.service_name
  vpc_endpoint_type = "GatewayLoadBalancer"
  auto_accept = true

  timeouts {
    create = "5m"
    delete = "5m"
  }

  tags = {
    Name = "TEST-GWLB-ENDPOINT"
  }
}

Expected Behavior

Terraform should wait for the GWLB-Endpoint to reach the "available" state (which it will, after a minute or two) before moving on and/or failing the complete.

Actual Behavior

Truly, the relevant portion of the configuration above is the GWLB-Endpoint. This error only seems to occur when the GWLB-Endpoint is set to "auto_accept = true". If this is set to false, Terraform will complete the process just fine. Strangely, even when "auto_accept" is set to true and Terraform fails, the GWLB-Endpoints will actually create, but the current versions of the AWS provider apparently expect it to be nearly instant, when in reality, it can take at least a minute or two for the GWLB-Endpoints to create, then have their connection to the Endpoint Service accepted. It should be noted that the timeouts above do nothing to improve (or degrade) the performance issues.

│ Error: error waiting for VPC Endpoint (vpce-0c0436c6d033cbafe) to be accepted: unexpected state 'pending', wanted target 'available'. last error: %!s(<nil>)
│ 
│   with module.gwlb.aws_vpc_endpoint.sec-gwlb-endpoint[0],
│   on modules/gwlb/main.tf line 34, in resource "aws_vpc_endpoint" "sec-gwlb-endpoint":
│   34: resource "aws_vpc_endpoint" "sec-gwlb-endpoint" {
│ 
╵
╷
│ Error: error waiting for VPC Endpoint (vpce-01dff1eabb4564a66) to be accepted: unexpected state 'pending', wanted target 'available'. last error: %!s(<nil>)
│ 
│   with module.gwlb.aws_vpc_endpoint.sec-gwlb-endpoint[1],
│   on modules/gwlb/main.tf line 34, in resource "aws_vpc_endpoint" "sec-gwlb-endpoint":
│   34: resource "aws_vpc_endpoint" "sec-gwlb-endpoint" {
│ 
╵

Steps to Reproduce

  1. Create VPC in which to house subnets for GWLB and GWLB-Endpoints.
  2. Create subnets within VPC for GWLB and GWLB-Endpoints.
  3. Create GWLB.
  4. Create GWLB Service.
  5. Create GWLB-Endpoints (and reference Service); set them to "auto_accept = true".
  6. Terraform will fail as soon as the GWLB-Endpoints are created, without giving them time to reach the "available" state.

Important Factoids

This functionality worked in AWS Provider version 3.38.0. In fact, if I revert to 3.38.0, I can still create the GWLB-Endpoints on the first try. I believe it was initially fixed in aws_vpc_endpoint does not autoaccept #14604, but the branch which contained the issue must have been merged back into the main branch, as it does not work in 3.53.0 (or many of the recent versions).

References

github-actions[bot] commented 1 year ago

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

skeggse commented 1 year ago

Not stale.