hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.33k stars 1.73k forks source link

Setting "remove_default_node_pool = true" uses name "default-pool" by default irrespective of the defined config #13501

Open rd-nikhil-singh opened 1 year ago

rd-nikhil-singh commented 1 year ago

Community Note

Terraform Version

Terraform v1.3.7 on darwin_amd64

Affected Resource(s)

Terraform Configuration Files

provider "google" {
  project = "xyz"
  region  = "europe-west1"
}

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.48.0"
    }
  }
  required_version = ">= 1.3"
}

resource "google_service_account" "default" {
  account_id   = "test-gke-cluster-sa"
  display_name = "test-gke-cluster-sa service account"
  project      = "xyz"
}

resource "google_container_cluster" "primary" {
  name       = "test-gke-cluster"
  location   = "europe-west1"
  project    = "xyz"
  network    = "abc"
  subnetwork = "def"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "default-pool"
  location   = "europe-west1"
  project    = "xyz"
  cluster    = google_container_cluster.primary.name
  node_count = 1

  node_config {
    machine_type = "e2-medium"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

Debug Output

Sorry, I cannot share it not to reveal confidential details.

Expected Behavior

If you terraform import the existing cluster. For example,

$ terraform import google_container_cluster.primary xyz/europe-west1/test-gke-cluster
google_container_cluster.primary: Importing from ID "xyz/europe-west1/test-gke-cluster"...
google_container_cluster.primary: Import prepared!
  Prepared google_container_cluster for import
google_container_cluster.primary: Refreshing state... [id=projects/rxyzlocations/europe-west1/clusters/test-gke-cluster]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

You see the following changes with terraform plan:

$ terraform plan
google_service_account.default: Refreshing state... [id=projects/xyz/serviceAccounts/test-gke-cluster-sa@xyz.iam.gserviceaccount.com]
google_container_cluster.primary: Refreshing state... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster]
google_container_node_pool.primary_nodes: Refreshing state... [id=projects/xyzlocations/europe-west1/clusters/test-gke-cluster/nodePools/default-pool]

Terraform used the selected providers to generate the following execution plan. Resource actions
are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # google_container_cluster.primary will be updated in-place
  ~ resource "google_container_cluster" "primary" {
        id                          = "projects/xyz/locations/europe-west1/clusters/test-gke-cluster"
        name                        = "test-gke-cluster"
      + remove_default_node_pool    = true
        # (25 unchanged attributes hidden)

        # (17 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Applying the above should not remove the "default-pool" node pool since it was created by the resource "google_container_node_pool" "primary_nodes"

It does not do that if the name used is anything else except "default-pool". For example, "default" gives the expected result.

Actual Behavior

$ terraform apply
google_service_account.default: Refreshing state... [id=projects/xyz/serviceAccounts/test-gke-cluster-sa@xyz.iam.gserviceaccount.com]
google_container_cluster.primary: Refreshing state... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster]
google_container_node_pool.primary_nodes: Refreshing state... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster/nodePools/default-pool]

Terraform used the selected providers to generate the following execution plan. Resource actions
are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # google_container_cluster.primary will be updated in-place
  ~ resource "google_container_cluster" "primary" {
        id                          = "projects/xyz/locations/europe-west1/clusters/test-gke-cluster"
        name                        = "test-gke-cluster"
      + remove_default_node_pool    = true
        # (25 unchanged attributes hidden)

        # (17 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_container_cluster.primary: Modifying... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster]
google_container_cluster.primary: Still modifying... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster, xx s elapsed]
google_container_cluster.primary: Modifications complete after 3m43s [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

We observed the following message during the apply in Google Cloud Console:

Deleting the node pool.
The values shown below will be updated once the operation is finished.

It has removed the "default-pool" node pool even though it was defined separately by the resource "google_container_node_pool" "primary_nodes"

Running terraform plan again shows the following:

$ terraform plan
google_service_account.default: Refreshing state... [id=projects/xyz/serviceAccounts/test-gke-cluster-sa@rd-kumo-dev.iam.gserviceaccount.com]
google_container_cluster.primary: Refreshing state... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster]
google_container_node_pool.primary_nodes: Refreshing state... [id=projects/xyz/locations/europe-west1/clusters/test-gke-cluster/nodePools/default-pool]

Terraform used the selected providers to generate the following execution plan. Resource actions
are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_node_pool.primary_nodes will be created
  + resource "google_container_node_pool" "primary_nodes" {
      + cluster                     = "test-gke-cluster"
      + id                          = (known after apply)
      + initial_node_count          = (known after apply)
      + instance_group_urls         = (known after apply)
      + location                    = "europe-west1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "default-pool"
      + name_prefix                 = (known after apply)
      + node_count                  = 1
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "xyz"
      + version                     = (known after apply)

      + management {
          + auto_repair  = (known after apply)
          + auto_upgrade = (known after apply)
        }

      + network_config {
          + create_pod_range     = (known after apply)
          + enable_private_nodes = (known after apply)
          + pod_ipv4_cidr_block  = (known after apply)
          + pod_range            = (known after apply)
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = (known after apply)
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "e2-medium"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "test-gke-cluster-sa@xyz.iam.gserviceaccount.com"
          + spot              = false
          + taint             = (known after apply)

          + shielded_instance_config {
              + enable_integrity_monitoring = (known after apply)
              + enable_secure_boot          = (known after apply)
            }

          + workload_metadata_config {
              + mode = (known after apply)
            }
        }

      + upgrade_settings {
          + max_surge       = (known after apply)
          + max_unavailable = (known after apply)
          + strategy        = (known after apply)

          + blue_green_settings {
              + node_pool_soak_duration = (known after apply)

              + standard_rollout_policy {
                  + batch_node_count    = (known after apply)
                  + batch_percentage    = (known after apply)
                  + batch_soak_duration = (known after apply)
                }
            }
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

We are seeing this because it was removed by "remove_default_node_pool = true" setting.

If you simply change the name from "default-pool" to "default" OR use "remove_default_node_pool = false". We do not experience this problem.

Steps to Reproduce

Copy the config from the above "Terraform Configuration Files". Alter it for your project and network settings and run:

  1. terraform apply
  2. terraform state rm google_container_cluster.primary
  3. terraform import google_container_cluster.primary xyz/europe-west1/test-gke-cluster
  4. terraform apply

Important Factoids

References

b/299600729

rileykarson commented 1 year ago

This is working as intended, although it's not documented well. The default pool in GKE is indistinguishable from other pools, and is identified by default-pool as its name. If you delete the pool on a cluster, recreate a pool with the same name, and then poll the API, node_config (configuration of the default pool) will report the attributes of the new pool you created.