Open benv666 opened 5 years ago
Managed to hack around the issue with initial GKE cluster creation using a data source to check if the cluster already exists in combination with a dynamic node_config{} block, i.e.:
data "google_container_cluster" "exists" {
name = local.name
location = var.location
}
resource "google_container_cluster" "gke" {
(...)
dynamic "node_config" {
for_each = coalesce(data.google_container_cluster.exists.endpoint, "!") == "!" ? { foo : "bar" } : {}
content {
service_account = module.service-accounts.service_account_emails["gke"]
}
}
(...)
}
@rileykarson is this basically the same thing you were talking about earlier today?
Yep! @benv666, for a bit simpler of a workaround you can specify the default pool's service account using the top-level node_config
block and add lifecycle.ignore_changes
on node_config
.
I'm hoping to be able to remove the need for an intermediary node pool to be created in 3.0.0
; otherwise, if there's still an intermediary pool, I'll likely expose a top-level field to set the service account.
I didn't end up making some of the changes I thought I was going to make in 3.0.0
(I decided against them), so a top-level field doesn't make sense.
Unfortunately, a node pool (with an SA attached) is required at creation time by the API. I'm tracking an issue to relax this restriction and improve the UX here, but right now workarounds like highlighted above are the best way to handle this.
2024, still the same issue.
Here is my code :
#https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc?hl=fr
resource "google_container_cluster" "default" {
name = local.name
location = var.REGION
release_channel {
channel = "STABLE"
}
deletion_protection = true
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
network = data.terraform_remote_state.project.outputs.vpc.self_link
subnetwork = google_compute_subnetwork.default.self_link
default_max_pods_per_node = var.GKE_MAX_PODS_PER_NODE
ip_allocation_policy {
cluster_secondary_range_name = local.ip_ranges.pods.name
services_secondary_range_name = local.ip_ranges.services.name
}
workload_identity_config {
workload_pool = "${data.google_project.default.project_id}.svc.id.goog"
}
node_locations = data.google_project.default.labels["gcp-env"] == "prod" ? random_shuffle.zones.result : null
# Utiliser le sous-paramètre GKE
# https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing?hl=fr
enable_l4_ilb_subsetting = true
monitoring_config {
enable_components = [
"APISERVER",
"CONTROLLER_MANAGER",
"DAEMONSET",
"DEPLOYMENT",
"HPA",
"POD",
"SCHEDULER",
"STATEFULSET",
"STORAGE",
"SYSTEM_COMPONENTS",
]
managed_prometheus {
enabled = true
}
}
cost_management_config {
enabled = true
}
addons_config {
gke_backup_agent_config {
enabled = true
}
}
maintenance_policy {
recurring_window {
start_time = "2024-03-13T08:00:00Z"
end_time = "2024-03-13T14:00:00Z"
recurrence = "FREQ=WEEKLY;BYDAY=TU,WE"
}
}
depends_on = [
google_compute_subnetwork.default,
google_compute_subnetwork_iam_member.shared_vpc_user_container_sa,
google_compute_subnetwork_iam_member.shared_vpc_user_api_sa,
]
}
resource "google_container_node_pool" "main" {
for_each = local.node_pools
name_prefix = replace("${each.value.prefix}-", "/--$/", "-")
cluster = google_container_cluster.default.id
node_count = each.value.node_count
node_locations = try(each.value.location, null) != null ? [each.value.location] : null
node_config {
spot = each.value.preemptible
machine_type = each.value.machine_type
# Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
service_account = google_service_account.pool.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
tags = try(each.value.tags, null)
labels = try(each.value.labels, null)
resource_labels = try(each.value.resource_labels, null)
dynamic "gvnic" {
for_each = try(each.value.gvnic, true) == true ? [true] : []
content {
enabled = gvnic.value
}
}
dynamic "taint" {
for_each = try(each.value.taints, null) != null ? each.value.taints : []
content {
key = taint.value.key
value = taint.value.value
effect = taint.value.effect
}
}
}
management {
auto_upgrade = true
}
network_config {
enable_private_nodes = true
}
}
Community Note
Terraform Version
Terraform v0.12.8
Affected Resource(s)
Terraform Configuration Files
Debug Output
n.a.
Panic Output
No panic
Expected Behavior
Terraform should have created a GKE cluster using the node_config from the node pool, including the service_account.
Actual Behavior
Terraform creates the cluster with initial node pool to be deleted using the default service account, which it has no permissions for - in fact, it's probably deleted.
Steps to Reproduce
terraform apply
Important Factoids
I first tried creating the GKE cluster using a node_config specifying the desired service_account for this task. This worked fine, however, due to the node_config then being specified both in the cluster definition AND the node_pool it will run into consistency issues, trying to recreate the GKE cluster every terraform run due to changed (computed) variables. When I then removed the node_config{} from the google_container_cluster definition it became stable, but now I can't use the same code to recreate this from scratch due to the default service account usage.
References
2115
b/299442589