hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

Can't destroy VPC (google_compute_network) created by terraform #9812

Open tek08 opened 3 years ago

tek08 commented 3 years ago

Community Note

Terraform Version

Terraform v1.0.4 on darwin_arm64

Affected Resource(s)

Terraform Configuration Files


resource "google_compute_network" "egress-network" {
  name                    = "egress-network"
  auto_create_subnetworks = "false"
}

resource "google_compute_subnetwork" "egress-subnetwork" {
  name          = "egress-subnetwork"
  ip_cidr_range = "10.2.0.0/28"
  network       = google_compute_network.egress-network.id
  region        = var.region
}

resource "google_vpc_access_connector" "access-connector" {
  provider = google-beta
  name     = "vpc-connector"
  project  = var.project

  subnet {
    name = google_compute_subnetwork.egress-subnetwork.name
  }

  region = var.region
}

resource "google_compute_router" "router" {
  name    = "egress-compute-router"
  network = google_compute_network.egress-network.name
  region  = var.region
}

resource "google_compute_address" "static-ip-for-egress" {
  name   = "egress-static-ip"
  region = var.region
}

resource "google_compute_router_nat" "nat" {
  name   = "egress-router-nat"
  router = google_compute_router.router.name
  region = google_compute_router.router.region

  nat_ip_allocate_option = "MANUAL_ONLY"
  nat_ips                = google_compute_address.static-ip-for-egress.*.self_link

  source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"
  subnetwork {
    name                    = google_compute_subnetwork.egress-subnetwork.id
    source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
  }
}

Debug Output

`Error: Error waiting for Deleting Network: The network resource 'projects//global/networks/egress-network' is already being used by 'projects/global/networkInstances/v1460259370-47320b4b-55e7-49c0-a22a-43ec4c643d5c'

Expected Behavior

VPC network should be cleanly deleted

Actual Behavior

VPC networks fails to delete, citing it being in use by a "global/networkInstances" obj.

Steps to Reproduce

  1. terraform apply with above code
  2. Comment out above code
  3. terraform apply

References

Am having the same issue as This ServerFault User, but I created the vpc using terraform. Am having trouble getting it to delete cleanly.

b/321386426

briantopping commented 3 years ago

GCP sometimes creates internal resources (like firewalls) that depend on the network. Some of what's created can depend on local project policy. It's not always easy to tell, I don't know if the Gcloud API can enumerate "things that depend on this network" before trying to delete it.

It's even more confusing when one goes to GCP console and deletes it too quickly. The console finds the transitive closure of these dependent resources to delete them, but more are still being created, so the console attempt fails as well. Repeating the delete too quickly can make the problem go on forever. But then some of the dependent resources actually can be deleted without them coming back automatically.

What I've found works is to simply wait for a bit so these background processes finish what they are doing, and there's no good way to know when that is, so I just wait a couple of minutes, then delete the network. Usually it gets a good closure on all the resources and gets the job done before the background processes have a chance to make a mess of it again. Seems unlikely that Terraform is going to be able to get around this if the GCP console can't even manage to do so.

Metroxe commented 2 years ago

I have replicated above, and have waited over 48 hours with no deletion.

I however did find the following article in the docs, about a transitive state lasting 30 days. https://cloud.google.com/vpc/docs/deprovisioning-shared-vpc#deleting_shared_vpc_service_project

Looked everywhere for a networkInstances resources location, but cannot find anything via the GUI.

KB30497 commented 2 years ago

Do you have a serverless function (Cloud Function, Cloud Run, etc.) still referencing the VPC connector? I've run into this issue where the VPC connector was destroyed, but there was a Cloud Function in the project still referencing the "destroyed" connector. Once the Cloud Function was deleted, the VPC network destroy worked properly.

Metroxe commented 2 years ago

@KB30497 I have looked everywhere for something still referencing it. I just can't for the life of me find anything. I'm currently 6 days out from the transitive state, i'm curious to see now if it is just a 30 lapse needs to pass.

ehartsock commented 2 years ago

Just hit this issue as well:

Error: Error waiting for Deleting Network: The network resource 'projects/$PROJECT/global/networks/cloud-run-vpc' is already being used by 'projects/$PROJECT/global/networkInstances/v-1000...'

No references to the Cloud Run service that was removed anywhere in the console.

RavianXReaver commented 2 years ago

I'm having the same issue: from both console and terraform

│ Error: Error waiting for Deleting Network: The network resource 'projects/$PROJECT/global/networks/$vpcname' is already being used by 'projects/$PROJECT/global/networkInstances/v1292417645-30374041-349e-40c0-8008-58d6cce5caea'

update: Tried to delete the vpc from the console and I got the same error. Then tried to destroy from terraform and got a new error.

│ Error: Error waiting for Deleting Network: The network resource 'projects/$PROJECT/global/networks/$vpcname' is already being used by 'projects/$PROJECT/global/routes/default-route-0abd8c163a62e017' │

Then I deleted the default route and im back to the head of the snake: "being used by /networkInstances/".

sege665 commented 2 years ago

I have the same problem. Can't find the referred resource in any way. Very annoying. Can't delete manually via the console either. Been waiting for way more than 30 days.

Anyone have any luck with this?

c2thorn commented 2 years ago

Hey folks, this seems to be a common issue that doesn't stem from the Terraform implementation itself. From what I can tell, there may be dependent resources not visible to users that block deletion. I have reached out to the service team for official guidance/instructions, but for now my best advice would be to contact GCP support to help remove the blocking resources. There doesn't seem to be anything we can do on the provider level.

OliverCardoza commented 2 years ago

Would it make more sense to create a Google Cloud bug requesting better APIs to force-delete networking resources? If those were present then I think these providers could make use of them to resolve.

The Cloud Report Issues page has links to view and create bugs related to "Virtual Private Cloud networks". That seems like the most reasonable component:

bschaatsbergen commented 2 years ago

Still facing this issue.. not related to the Google Terraform provider implementation though. as @c2thorn mentioned, we're best of by talking to the support team.

Perhaps we can close this issue as it's not related to the provider?

bardsleysdgr commented 1 year ago

FYI, here's a Google bug report that seems to be the same issue.

https://issuetracker.google.com/issues/186792016?pli=1

zymotik commented 1 year ago

FYI, here's a Google bug report that seems to be the same issue.

https://issuetracker.google.com/issues/186792016?pli=1

Thanks Nate. These reproduction steps should be really helpful for the Google Team. I hope they see this and it helps speed up the fix.

This was first reported to Google on Apr 30, 2021. Google replied that they are "working on it" on Jul 21, 2021. They mention a couple of times they are "working on it" at a product team level (Mar 24, 2022, Apr 21, 2022) yet it still remains unfixed in Jan 2023. Note the issue is marked as a priority P2 with 152 people managing to find it and click "me too". One user reports that they have hit the limit of the amounts of VPC's he can create and so had to move to a new Google Cloud project. I'm disappointed and frustrated with the Google Cloud Platform, I hope it's fixed soon.

anisnasir commented 1 year ago

To me the issue seems to be in the serverless vpc connector, the terraform destroy works without the google_vpc_access_connector resource.

rbnhd commented 7 months ago

This issue still exists. I created GKE cluster and then aN ingress service, which ended up creating multiple NEG in GCP, but now I can't delete the VPC because of this error.

The solution for error like below: The network resource is already being used by 'projects/PROJECT/zones/ZONE/networkEndpointGroups/ is to delete the NEGs with gcloud command (use the GCP console shell for ease) In my case, the issue was with NEG, but in general, if the issue is because if any other resource, there seems to be no way other than manually listing & deleting them.

lists the NEGs, (non beta command also works fine) gcloud beta compute network-endpoint-groups list

for each NEG listed, run gcloud beta compute network-endpoint-groups delete NAME --zone ZONE

aseem-heg commented 6 months ago

I also tried deleting network from gcp console manually. Got the same error

The network resource 'projects/p1/global/networks/gae-external-network' is already being used by 'projects/p1/global/networkInstances/v143776469-459a125b-95be-42e0-a90c-d6blok934a23

Same error from gcloud command too. What is the solution/workaround ? I dont have any vpc access connector or any other resource referencing this gae-external-network

kuisathaverat commented 5 months ago

In my case, I create a cluster with the following plan, and after everything is created, I destroy it. It fails in the deletion of the subnetwork. Adding depends_on to all resources fixes the issue, I guess, because I gave the correct order to destroy the resources.

variable "network" {
  description = "The name of the network to create"
  type        = string
  default = "gke-network"
}

provider "google" {
}

data "google_client_config" "default" {}

data "google_compute_network" "vpc_network" {
  name       = var.network
  project    = data.google_client_config.default.project

  # Manual dependency to fix the issue
  depends_on = [google_compute_network.vpc_network]

}

resource "google_compute_network" "vpc_network" {
  name    = var.network
  project = data.google_client_config.default.project
}

resource "google_compute_subnetwork" "vpc_subnet" {
  name          = "${var.network}-subnet"
  project       = data.google_client_config.default.project
  network       = data.google_compute_network.vpc_network.name
  ip_cidr_range = "10.0.0.0/16"

  # Manual dependency to fix the issue
  depends_on = [ google_compute_network.vpc_network ]

  secondary_ip_range {
    range_name    = "${var.network}-subnet-pods"
    ip_cidr_range = "10.1.0.0/16"
  }
  secondary_ip_range {
    range_name    = "${var.network}-subnet-services"
    ip_cidr_range = "10.2.0.0/16"
  }
}

provider "kubernetes" {
  host                   = "https://${module.gke.endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.gke.ca_certificate)
}

# https://github.com/terraform-google-modules/terraform-google-kubernetes-engine
module "gke" {
  source            = "terraform-google-modules/kubernetes-engine/google//modules/beta-public-cluster"
  project_id        = data.google_client_config.default.project
  name              = var.cluster_name
  region            = data.google_client_config.default.region
  zones             = [data.google_client_config.default.zone]
  network           = data.google_compute_network.vpc_network.name

  # Manual dependency to fix the issue
  depends_on = [ google_compute_network.vpc_network, google_compute_subnetwork.vpc_subnet ]

  subnetwork        = "${var.network}-subnet"
  ip_range_pods     = "${var.network}-subnet-pods"
  ip_range_services = "${var.network}-subnet-services"
  remove_default_node_pool          = false
  disable_legacy_metadata_endpoints = false
  deletion_protection = false
  logging_service     = "none"
  monitoring_service  = "none"
  # regional           = false
}
damianFelixPago commented 3 months ago

Having the same issue here as well; creating a private cluster using module.gke creates firewall rules and loadbalancing rules that terraform doesn't know about so when trying to destroy the destruction of the network doesn't happen because of the dependency on all these extra things that are created that terraform doesn't know about.