hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.26k stars 1.7k forks source link

function deleteSWGAutoGenRouter doesn't wait for the operation to finish #18140

Open teyuchang opened 2 months ago

teyuchang commented 2 months ago

Community Note

Terraform Version & Provider Version(s)

Terraform v0.13.7 on linux/amd64

Affected Resource(s)

google_network_services_gateway

Terraform Configuration

resource "google_compute_network" "default" {
  name                    = "my-network"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "default" {
  name          = "my-subnetwork-name"
  purpose       = "PRIVATE"
  ip_cidr_range = "10.128.0.0/20"
  region        = "us-central1"
  network       = google_compute_network.default.id
  role          = "ACTIVE"
}

resource "google_compute_subnetwork" "proxyonlysubnet" {
  name          = "my-proxy-only-subnetwork"
  purpose       = "REGIONAL_MANAGED_PROXY"
  ip_cidr_range = "192.168.0.0/23"
  region        = "us-central1"
  network       = google_compute_network.default.id
  role          = "ACTIVE"
}

resource "google_network_security_gateway_security_policy" "default" {
  name        = "my-policy-name"
  location    = "us-central1"
}

resource "google_network_security_gateway_security_policy_rule" "default" {
  name                    = "my-policyrule-name"
  location                = "us-central1"
  gateway_security_policy = google_network_security_gateway_security_policy.default.name
  enabled                 = true  
  priority                = 1
  session_matcher         = "host() == 'example.com'"
  basic_profile           = "ALLOW"
}

resource "google_network_services_gateway" "default" {
  name                                 = "my-gateway1"
  location                             = "us-central1"
  addresses                            = ["10.128.0.99"]
  type                                 = "SECURE_WEB_GATEWAY"
  ports                                = [443]
  gateway_security_policy              = google_network_security_gateway_security_policy.default.id
  network                              = google_compute_network.default.id
  subnetwork                           = google_compute_subnetwork.default.id
  delete_swg_autogen_router_on_destroy = true
  depends_on                           = [google_compute_subnetwork.proxyonlysubnet]
}

Debug Output

No response

Expected Behavior

The function deleteSWGAutoGenRouter should wait until the operation finishes.

Actual Behavior

deleteSWGAutoGenRouter returns immediately after it sends a Delete request without waiting the operation to finish. It sometimes results in terraform destroy failure

Error: Error waiting for Deleting Network: The network resource 'projects/xxx/global/networks/my-network' is already being used by 'projects/xxx/regions/us-central1/routers/swg-autogen-router-1234567890'

Steps to reproduce

  1. terraform apply
  2. terraform destroy

Important Factoids

No response

References

No response

b/342170266

ggtisc commented 2 months ago

Hi @teyuchang!

I used exactly your same code, terraform version(0.13.7) and Google provider version(4.84.0) and followed your steps to reproduce this issue:

  1. tarraform apply
  2. terraform destroy

But I didn't get any error or the behavior you commented. Are there other configurations or resources involved, or do I need to wait more than a minute after creation to run `terraform destroy?

teyuchang commented 2 months ago

Thank you for testing. Unfortunately, this bug is flaky and may not happen every time. The key factor in reproducing it is the time it takes to delete the router – a longer deletion time makes it more likely to happen. To consistently reproduce it, you can use a test stub instead of actual GCP endpoints and intentionally delay the router deletion process.

Here's a simplified version of how the Terraform example works:

Creation(terraform apply):

  1. Create a VPC network.
  2. Create subnets.
  3. Create a secure web gateway, which automatically creates a router.

Deletion(terraform destroy):

  1. Delete the secure web gateway. (The router is also deleted automatically because delete_swg_autogen_router_on_destroy is set to true).
  2. Delete subnets.
  3. Delete the VPC network.

The problem occurs during step 1 of the deletion process. Terraform attempts to delete the router but doesn't wait for the long-running operation to finish. This means the router might still exist when Terraform tries to delete the VPC in step 3, causing the VPC deletion to fail.

The relevant code is in deleteSWGAutoGenRouter. The response is ignored using _. Instead, it should be handled similarly to the code in resourceNetworkServicesGatewayDelete, where the response is captured in a res variable and the operation is waited upon, as shown here.

ggtisc commented 2 months ago

Confirmed issue, as the user reports the more time we wait to delete the resources triggers this behavior. After waiting more than 12 hrs and running a terraform destroy it returns the specified message:

Error: Error waiting for Deleting Network: The network resource 'projects/xxx/global/networks/my-network' is already being used by 'projects/xxx/regions/us-central1/routers/swg-autogen-router-1234567890'