hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.35k stars 1.75k forks source link

GKE cluster creation without node_config requires default service account #4435

Open benv666 opened 5 years ago

benv666 commented 5 years ago

Community Note

Terraform Version

Terraform v0.12.8

Affected Resource(s)

Terraform Configuration Files

resource "google_container_cluster" "gke" {
  provider                 = "google-beta"
  name                     = local.name
  resource_labels          = local.labels
  location                 = var.location
  remove_default_node_pool = true
  initial_node_count       = 1
  ip_allocation_policy {
    use_ip_aliases                = true
    cluster_secondary_range_name  = format("k8pods-%s", local.slice)
    services_secondary_range_name = format("k8services-%s", local.slice)
  }
  subnetwork         = data.google_compute_subnetwork.subnet.name
  network            = data.google_compute_network.net.self_link
  private_cluster_config {
    enable_private_endpoint = true
    enable_private_nodes    = true
    master_ipv4_cidr_block  = var.master_ipv4_cidr_block
  }
resource "google_container_node_pool" "pools" {
  provider = "google-beta"

  for_each = local.node_pools
  cluster  = google_container_cluster.gke.name
  name       = format("%s-%s", local.name, lower(replace(each.key, " ", "-")))
  location   = var.location
  node_count = (each.value.node_count == 0 ? null : each.value.node_count)
  node_config {
    disk_size_gb = each.value.disk_size_gb
    disk_type    = each.value.disk_type
    metadata        = merge(each.value.metadata, { disable-legacy-endpoints = "true" })
    oauth_scopes    = each.value.oauth_scopes
    preemptible     = each.value.preemptible
    service_account = module.service-accounts.service_account_emails["gke"]
    tags            = each.value.tags
  }
  depends_on = [
    google_service_account_iam_member.tfe_sa_user,
    google_project_iam_member.container-node-logwriter
  ]
}

Debug Output

n.a.

Panic Output

No panic

Expected Behavior

Terraform should have created a GKE cluster using the node_config from the node pool, including the service_account.

Actual Behavior

Terraform creates the cluster with initial node pool to be deleted using the default service account, which it has no permissions for - in fact, it's probably deleted.

Error: googleapi: Error 400: The user does not have access to service account "default". Ask a project owner to grant you the iam.serviceAccountUser role on the service account., badRequest

Steps to Reproduce

  1. terraform apply

Important Factoids

I first tried creating the GKE cluster using a node_config specifying the desired service_account for this task. This worked fine, however, due to the node_config then being specified both in the cluster definition AND the node_pool it will run into consistency issues, trying to recreate the GKE cluster every terraform run due to changed (computed) variables. When I then removed the node_config{} from the google_container_cluster definition it became stable, but now I can't use the same code to recreate this from scratch due to the default service account usage.

References

b/299442589

benv666 commented 5 years ago

Managed to hack around the issue with initial GKE cluster creation using a data source to check if the cluster already exists in combination with a dynamic node_config{} block, i.e.:

data "google_container_cluster" "exists" {
  name     = local.name
  location = var.location
}
resource "google_container_cluster" "gke" {
  (...)
  dynamic "node_config" {
    for_each = coalesce(data.google_container_cluster.exists.endpoint, "!") == "!" ? { foo : "bar" } : {}
    content {
      service_account = module.service-accounts.service_account_emails["gke"]
    }
  }
  (...)
}
danawillow commented 5 years ago

@rileykarson is this basically the same thing you were talking about earlier today?

rileykarson commented 5 years ago

Yep! @benv666, for a bit simpler of a workaround you can specify the default pool's service account using the top-level node_config block and add lifecycle.ignore_changes on node_config.

I'm hoping to be able to remove the need for an intermediary node pool to be created in 3.0.0; otherwise, if there's still an intermediary pool, I'll likely expose a top-level field to set the service account.

rileykarson commented 4 years ago

I didn't end up making some of the changes I thought I was going to make in 3.0.0 (I decided against them), so a top-level field doesn't make sense.

Unfortunately, a node pool (with an SA attached) is required at creation time by the API. I'm tracking an issue to relax this restriction and improve the UX here, but right now workarounds like highlighted above are the best way to handle this.

jlenuffgsoi commented 4 months ago

2024, still the same issue.

Here is my code :

#https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc?hl=fr
resource "google_container_cluster" "default" {
  name     = local.name
  location = var.REGION

  release_channel {
    channel = "STABLE"
  }

  deletion_protection = true
  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  network                   = data.terraform_remote_state.project.outputs.vpc.self_link
  subnetwork                = google_compute_subnetwork.default.self_link
  default_max_pods_per_node = var.GKE_MAX_PODS_PER_NODE

  ip_allocation_policy {
    cluster_secondary_range_name  = local.ip_ranges.pods.name
    services_secondary_range_name = local.ip_ranges.services.name
  }

  workload_identity_config {
    workload_pool = "${data.google_project.default.project_id}.svc.id.goog"
  }

  node_locations = data.google_project.default.labels["gcp-env"] == "prod" ? random_shuffle.zones.result : null

  # Utiliser le sous-paramètre GKE
  # https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing?hl=fr
  enable_l4_ilb_subsetting = true

  monitoring_config {
    enable_components = [
      "APISERVER",
      "CONTROLLER_MANAGER",
      "DAEMONSET",
      "DEPLOYMENT",
      "HPA",
      "POD",
      "SCHEDULER",
      "STATEFULSET",
      "STORAGE",
      "SYSTEM_COMPONENTS",
    ]
    managed_prometheus {
      enabled = true
    }
  }

  cost_management_config {
    enabled = true
  }

  addons_config {
    gke_backup_agent_config {
      enabled = true
    }
  }

  maintenance_policy {
    recurring_window {
      start_time = "2024-03-13T08:00:00Z"
      end_time   = "2024-03-13T14:00:00Z"
      recurrence = "FREQ=WEEKLY;BYDAY=TU,WE"
    }
  }

  depends_on = [
    google_compute_subnetwork.default,
    google_compute_subnetwork_iam_member.shared_vpc_user_container_sa,
    google_compute_subnetwork_iam_member.shared_vpc_user_api_sa,
  ]
}

resource "google_container_node_pool" "main" {
  for_each       = local.node_pools
  name_prefix    = replace("${each.value.prefix}-", "/--$/", "-")
  cluster        = google_container_cluster.default.id
  node_count     = each.value.node_count
  node_locations = try(each.value.location, null) != null ? [each.value.location] : null

  node_config {
    spot         = each.value.preemptible
    machine_type = each.value.machine_type

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.pool.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    tags            = try(each.value.tags, null)
    labels          = try(each.value.labels, null)
    resource_labels = try(each.value.resource_labels, null)

    dynamic "gvnic" {
      for_each = try(each.value.gvnic, true) == true ? [true] : []
      content {
        enabled = gvnic.value
      }
    }

    dynamic "taint" {
      for_each = try(each.value.taints, null) != null ? each.value.taints : []
      content {
        key    = taint.value.key
        value  = taint.value.value
        effect = taint.value.effect
      }
    }
  }

  management {
    auto_upgrade = true
  }

  network_config {
    enable_private_nodes = true
  }
}