terraform-google-modules / terraform-google-kubernetes-engine

Configures opinionated GKE clusters
https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google
Apache License 2.0
1.15k stars 1.17k forks source link

Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

Closed wyardley closed 2 months ago

wyardley commented 2 months ago

TL;DR

I'm seeing permadrift and an inability to suppress trying to update gcfs_config.enabled:

          + gcfs_config {
              + enabled = false
            }

Simple repro case below. I created the cluster with an older module version, then updated to 33.0.3. Creating with the new version creates no issue.

This could end up being a provider issue, but I would think at least when it's null, the module shouldn't try to set it.

Expected behavior

Terraform to not see a diff, and to apply changes in the case where the setting is changed in the module.

Observed behavior

Terraform keeps trying to set the value (and, in the case of < 6.2.0, also tries to force replace the nodepool) 5.43.0

          + gcfs_config { # forces replacement
              + enabled = false # forces replacement
            }

6.2.0

  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/zelig-tfprovidertest-sandbox/locations/us-central1/clusters/test-cluster/nodePools/primary"
        name                        = "primary"
        # (10 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
                "gke-test-cluster",
                "gke-test-cluster-primary",
            ]
            # (16 unchanged attributes hidden)

          + gcfs_config {
              + enabled = false
            }

            # (3 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

Terraform Configuration

provider "google" {
  region  = "us-central1"
  zone    = "us-central1-f"
}

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "5.43.0"
    }
  }
}

module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version                    = "32.0.0" ## update this to 33.0.3 after
  project_id                 = [project id]
  name                       = "test-cluster"
  service_account_name       = "foobarbaz"
  grant_registry_access      = true
  kubernetes_version         = "1.30.3-gke.1639000"
  release_channel            = "UNSPECIFIED"
  region                     = "us-central1"
  zones                      = ["us-central1-f"]
  network                    = "default"
  subnetwork                 = "default"
  horizontal_pod_autoscaling = true
  enable_private_nodes       = true
  dns_cache                  = true
  ip_range_pods              = "foo-pods"
  ip_range_services          = "foo-services"
  master_ipv4_cidr_block     = "10.10.0.0/28"
  remove_default_node_pool = true
  node_pools = [
    # Note: this is intentionally different from the actual default,
    # "default-pool"
    {
      name                      = "primary"
      machine_type              = "e2-standard-4"
      total_min_count           = 1
      total_max_count           = 2
      local_ssd_count           = 0
      spot                      = false
      local_ssd_ephemeral_count = 0
      disk_size_gb              = 100
      disk_type                 = "pd-balanced"
      image_type                = "COS_CONTAINERD"
      enable_gcfs               = false
      enable_gvnic              = false
      logging_variant           = "DEFAULT"
      auto_upgrade              = false
      preemptible               = false
      pod_pids_limit            = 0
      cpu_manager_policy = ""
    },
  ]
}

Terraform Version

OpenTofu v1.8.2
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/google v5.43.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.32.0
+ provider registry.opentofu.org/hashicorp/random v3.6.2

Additional information

wyardley commented 2 months ago

side note: I tried deleting and re-importing the resource, in hopes that would resolve the problem, but it did not.

Looks like what it's getting back from the API is:

2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:  "nodePoolDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:   "nodeConfigDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:    "gcfsConfig": {},

But I don't actually see the provider trying to do anything when it "updates" the resource, and the fact that it sees the empty response as the value not being set is probably the underlying cause for this issue.

So this may be a provider issue, possibly related to https://github.com/GoogleCloudPlatform/magic-modules/pull/11553

As I mentioned to the author here, the test doesn't actually test updates: https://github.com/dominykasn/magic-modules/blob/22ea4977c565faa522b501f380e1461604d67645/mmv1/third_party/terraform/services/container/go/resource_container_node_pool_test.go.tmpl#L1739-L1743

wyardley commented 2 months ago

@apeabody I think this is the right fix (upstream provider level issue introduced by vs. something with this module): https://github.com/GoogleCloudPlatform/magic-modules/pull/11717 Hopefully, someone can see if it's possible to get that approved quickly early next week if it looks good.

Unfortunately, I think this affects both 5.x and 6.x providers (albeit in different ways) - earlier ones will have the force replacement issue, and 6.2 will have the harmless (because the update doesn't actually succeed), but ugly, permadiff

wyardley commented 2 months ago

https://github.com/hashicorp/terraform-provider-google/pull/19534 My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

apeabody commented 2 months ago

hashicorp/terraform-provider-google#19534 My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

Thanks @wyardley! - Keeping an eye out. - We might need to consider bumping the minimum to v6.4, at least for Autopilot sub-modules.

wyardley commented 2 months ago

@apeabody after updating to TPG 6.4.0, had to make the changes in #2111 locally to fully resolve this issue.