resource.google_container_cluster.dns_config causes cluster destroy/recreate

LevOlkha commented 1 year ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.4.6 on darwin_amd64 provider registry.terraform.io/hashicorp/google v4.70.0

Affected Resource(s)

google_container_cluster

Terraform Configuration Files

Debug Output

2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:  "networkConfig": {
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   "network": "projects/xxv/global/networks/xx",
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   "subnetwork": "projects/xx/regions/us-east1/subnetworks/xx",
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   "dnsConfig": {
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:    "clusterDns": "PLATFORM_DEFAULT"
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   },
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   "serviceExternalIpsConfig": {
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:    "enabled": true
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:   }
2023-06-22T02:22:55.410-0700 [DEBUG] provider.terraform-provider-google_v4.70.0_x5:  },

Expected Behavior

API that returns information about cluster can return networkConfig.dnsConfig.clusterDns = "PLATFORM_DEFAULT" or dnsConfig can be missing from networkConfig

both cases should be treated as equivalent.

if terraform plan doesn't have section google_container_cluster.dns_config and API returns networkConfig.dnsConfig.clusterDns = "PLATFORM_DEFAULT", no changes should be triggered.

Actual Behavior

if terraform plan doesn't have section google_container_cluster.dns_config and API returns networkConfig.dnsConfig.clusterDns = "PLATFORM_DEFAULT", terraform plan suggest destroying and recreating the cluster.

Steps to Reproduce

terraform apply

Important Factoids

cluster with DNS provider "kube-dns"

Partial workaround

adding

dns_config {
    cluster_dns = "PLATFORM_DEFAULT"
    cluster_dns_scope = "DNS_SCOPE_UNSPECIFIED"
}

or

dns_config {
    cluster_dns = "PLATFORM_DEFAULT"
}

will prevent plan from forcing cluster destroy/recreate. However it will suggest every time change: addition of cluster_dns_scope = DNS_SCOPE_UNSPECIFIED

So clean plan is never generated

b/291294501

edwardmedia commented 1 year ago

@LevOlkha with or without dns_config block, it all works for me like below config.

resource "google_container_cluster" "primary" {
  name     = "issue14959"
  location = "us-central1"

  remove_default_node_pool = true
  initial_node_count       = 1

/*
  dns_config {
    cluster_dns = "PLATFORM_DEFAULT"
    cluster_dns_scope = "DNS_SCOPE_UNSPECIFIED"
  }
*/
}

Can you share a config that can repro the issue?

edwardmedia commented 1 year ago

@LevOlkha is this still an issue?

LevOlkha commented 1 year ago

steps to reproduce:

create cluster in GKE:


resource "google_container_cluster" "default" {
name               = "cluster-tmp"
project            = "fubotv-dev"
location           = "us-east1-b"
initial_node_count = 1
#  network = var.network

}



`terrafrom apply`

2. go to GKE UI and change DNS provider from Kube-dns to Cloud-DNS
<img width="1002" alt="Screen Shot 2023-07-07 at 12 00 21" src="https://github.com/hashicorp/terraform-provider-google/assets/118217964/c80c6efe-3060-42e0-9bda-f6cacccc7af9">

<img width="598" alt="Screen Shot 2023-07-07 at 12 00 46" src="https://github.com/hashicorp/terraform-provider-google/assets/118217964/3495d461-ebed-4b3f-960f-1406a7cb4d50">

3. Click "save changes"
4. In UI change DNS provider back to Kube-dns
<img width="586" alt="Screen Shot 2023-07-07 at 12 08 50" src="https://github.com/hashicorp/terraform-provider-google/assets/118217964/0bae15e6-e954-43e1-8bba-1f0d716f585d">

5. Click  "save changes"
6. try to run `terraform apply`

danninov commented 1 year ago

Any update for this?

yorickdowne commented 1 year ago

I see this with AutoPilot and GKE 1.27. This is the cluster config:

resource "google_container_cluster" "primary" {
  name     = var.cluster_name
  location = var.region

  master_auth {
    client_certificate_config {
      issue_client_certificate = true
    }
  }

  # Enabling Autopilot for this cluster
  enable_autopilot = true

  network    = var.network
  subnetwork = var.subnetwork

  ip_allocation_policy {
    cluster_ipv4_cidr_block  = ""
    services_ipv4_cidr_block = ""
  }

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = var.authorized_network
    }
  }
}

Terraform will complain that DNS settings are changed from current values to null and delete and recreate the cluster

I can get a clean plan by adding

  dns_config {
    cluster_dns = "CLOUD_DNS"
    cluster_dns_domain = "cluster.local"
    cluster_dns_scope  = "CLUSTER_SCOPE"
  }

greenozon commented 1 year ago

Hit same issue recently autopilot GKE, 1.27 on every apply terraform tend to RECREATE full GKE !!???
it took from 15 to 20 minutes waste of time

after couple of days of reading/analyzing/experimenting here & there the solution was as stated above - add the dns_config section into autopilot terraform

  dns_config {
    cluster_dns = "CLOUD_DNS"
    cluster_dns_domain = "cluster.local"
    cluster_dns_scope  = "CLUSTER_SCOPE"
  }

is it a bug or what?

details


terraform version
Terraform v0.14.11
+ provider registry.terraform.io/hashicorp/external v2.3.1
+ provider registry.terraform.io/hashicorp/google v4.81.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.23.0
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.5.1

provider "registry.terraform.io/hashicorp/google" {
  version     = "4.81.0"

danninov commented 1 year ago

Any update for this?

yorickdowne commented 10 months ago

This is resolved in my environment with hashicorp/google v5.3.0. I don't need the dummy dns_config any longer. Looking at release notes, the bug fix may have come in hashicorp/google v5.1.0.

- Installed hashicorp/google v5.3.0 (signed by HashiCorp)
- Using previously-installed hashicorp/kubernetes v2.23.0
- Using previously-installed hashicorp/external v2.3.1

greenozon commented 10 months ago

Got some updates in GKE newsline

https://cloud.google.com/kubernetes-engine/docs/release-notes

gdubicki commented 6 months ago

However it will suggest every time change: addition of cluster_dns_scope = DNS_SCOPE_UNSPECIFIED

I am having only this problem ☝️ too.

My Terraform version:

Terraform v1.7.5
on linux_amd64

(using Terraform Cloud)

My provider version:

google = {
  # we need beta for the ephemeral_storage_config
  source = "hashicorp/google-beta"
  version = "~> 5.20.0"
}

My cluster code:

resource "google_container_cluster" "xxx" {
  project  = var.project_id
  location = var.region
  name     = var.xxx_cluster_name

  min_master_version = "1.28.3-gke.1286000"

  # *** Node Pools

  remove_default_node_pool = true
  initial_node_count       = 1

  cluster_autoscaling {
    # Node Auto-Provisioning
    enabled             = false
    autoscaling_profile = "BALANCED"
  }

  # *** Logs, Monitoring and Maintenance

  logging_config {
    enable_components = ["SYSTEM_COMPONENTS"]
  }

  monitoring_config {
    enable_components = ["SYSTEM_COMPONENTS"]

    managed_prometheus {
      enabled = false
    }
  }

  maintenance_policy {
    recurring_window {
      start_time = "2024-01-23T08:00:00Z"
      end_time   = "2024-01-23T14:00:00Z"
      recurrence = "FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR"
    }
  }

  # *** Security

  workload_identity_config {
    workload_pool = "yyy.svc.id.goog"
  }

  enable_legacy_abac = false

  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  # no extra cost, it's the new default
  enable_shielded_nodes = true

  # *** Networking

  network    = var.network
  subnetwork = var.subnetwork

  networking_mode = "VPC_NATIVE"

  private_cluster_config {
    enable_private_endpoint = false
    enable_private_nodes    = true
    master_ipv4_cidr_block  = "zz.zz.0.0/zz"
  }

  network_policy {
    enabled = false
  }

  ip_allocation_policy {
    cluster_ipv4_cidr_block = "zz.zz.0.0/zz"
  }

  dns_config {
    cluster_dns = "CLOUD_DNS"
    cluster_dns_scope = "DNS_SCOPE_UNSPECIFIED"
  }

  # *** Features

  addons_config {
    network_policy_config {
      disabled = true
    }
    http_load_balancing {
      disabled = false
    }
    gcp_filestore_csi_driver_config {
      enabled = true
    }
    gcs_fuse_csi_driver_config {
      enabled = true
    }
    gce_persistent_disk_csi_driver_config {
      enabled = true
    }
    dns_cache_config {
      enabled = true
    }
  }
  enable_l4_ilb_subsetting = true

  node_pool_defaults {
    node_config_defaults {
      gcfs_config {
        enabled = true
      }
    }
  }

  # aka GKE Dataplane v2
  datapath_provider = "ADVANCED_DATAPATH"

  # *** Cost Management

  # aka Usage Metering
  resource_usage_export_config {
    bigquery_destination {
      dataset_id = "all_billing_data"
    }
    enable_resource_consumption_metering = true
  }

  # aka Cost Allocation
  cost_management_config {
    enabled = true
  }

}

I have tried removing the cluster_dns_scope but it doesn't change anything - I always have this change in my plan:

cluster_dns_scope : "" → "DNS_SCOPE_UNSPECIFIED"

greenozon commented 6 months ago

@gdubicki have you tried this option? https://github.com/hashicorp/terraform-provider-google/issues/14959#issuecomment-1708412386

alfieyfc commented 5 months ago

However it will suggest every time change: addition of cluster_dns_scope = DNS_SCOPE_UNSPECIFIED

I am having only this problem ☝️ too.

I'm facing the same issue and this issue only too. Every time I run terraform plan I always get this:

      ~ dns_config {
          + cluster_dns_scope = "DNS_SCOPE_UNSPECIFIED"
            # (1 unchanged attribute hidden)
        }

even if terraform apply runs successfully saying the resource has changed:

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

the next time I repeat terraform plan without any code change, I see the same plan output.

the dns_config block is simply this in my file:

  dns_config {
    cluster_dns       = "PLATFORM_DEFAULT"
    cluster_dns_scope = "DNS_SCOPE_UNSPECIFIED"
  }

If I remove cluster_dns_scope from the block, I see the same behavior. If I change the value to cluster_dns_scope = "CLUSTER_SCOPE", the same behavior persists too, just with a different change plan:

      ~ dns_config {
          + cluster_dns_scope = "CLUSTER_SCOPE"
            # (1 unchanged attribute hidden)
        }

I've tried using the google-beta provider and the regular google provider, but either way, it didn't fix this issue.

We have automated pipelines detecting No changes from the plan's output to by-pass human reviews, and this is breaking our pipeline 😓

Edit: This GKE cluster is an existing standard cluster that has been using Kube-dns instead of Cloud DNS, and we recently imported it to terraform

benglewis commented 3 months ago

This bug is very annoying. It would be great if someone could please fix it 🙏 I'm not normally a big one to bump existing bug threads and add additional nonsense comments that don't add anything but "please fix this", but this bug makes every single update applied slower

greenozon commented 3 months ago

@benglewis what is your current google provider and tf config? there are recipes in this thread how to mitigate this super annoying bug (I also hit it in the past and after some weeks I hit this thread and solved it)

hashicorp / terraform-provider-google