Extend google_container_node_pool documentation on nodepool changes

nemethloci commented 8 months ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version

v1.7.4

Affected Resource(s)

google_container_node_pool

Terraform Configuration

# Before

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = "my-node-pool"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

# After

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = "my-node-pool"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-standard-2"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

Debug Output

No response

Expected Behavior

In case of a nodepool change, that requires changing the underlying template (which is immutable), the provider would ideally:

create a new nodepool with the new template
drain the old pool
delete the old pool

This is unfortunately not the current behavior probably because the above process would require the 'interaction' of multiple providers (google provider to manage the nodepools and the kubernetes provider to drain the nodes. Not sure if the nodepool resize operation on the GCP API could be abused for draining...?). Considering the nodepool upgrades work as expected following the above approach I think it would worth to emphasize in the documentation what would happen in case of non-version related changes to the node pool resource.

This would be especially important as not being aware of the above can contain service outage (see actual behavior for details)

Actual Behavior

Without create_before_destroy flag the nodepool is deleted and then recreated resulting in complete outage with regard to the services bound to the nodepool. In case it is set, there will still be outage of all services as:

the old pool is not drained but terminated abruptly (well, it seems, that scheduling is disabled immediately once a pool is to be deleted, but pods doesn't seem to be waited to be migrated to new pools (respecting poddisruptionbudget), but killed immediately, the autoscaler is trying to create new nodes even in the pool under deletion, which get deleted as well...)
the new node pool's initial node count might be significantly lower than the current size (cluster autoscaling) of the old one as such pods would need to be waiting for the autoscaler before they could be scheduled

Steps to reproduce

terraform apply

Important Factoids

No response

References

https://github.com/hashicorp/terraform-provider-google/issues/10895

b/331218437

nemethloci commented 8 months ago

Just to make clear: this is a request to improve documentation. Solving the issue behind this goes beyond the scope of this ticket and probably requires the cooperation of the GCP terraform provider developers and GCP itself.

ggtisc commented 8 months ago

Hi @nemethloci you can found an alternative documented here for this scenario: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool

So you can approach features like strategy and blue_green_settings for upgrades adjusting it according to your requirements.

nemethloci commented 8 months ago

Hi @ggtisc the feature you referenced is valid only with regard to upgrades and unfortunately does not apply when changing other aspects of the nodepool, which my request is about (ie: disk type, node type...)

ggtisc commented 8 months ago

@nemethloci could you please provide the code of the resource google_container_cluster.primary?

nemethloci commented 8 months ago

Here you go:

resource "google_container_cluster" "primary" {
  provider = google-beta
  project  = var.project_id
  location = var.region
  name     = var.cluster_name

  network    = var.cluster_network
  subnetwork = var.cluster_subnetwork

  logging_service    = var.logging_service
  monitoring_service = var.monitoring_service
  min_master_version = local.kubernetes_version
  release_channel {
    channel = "UNSPECIFIED"
  }

  enable_shielded_nodes = var.enable_shielded_nodes

  dynamic "master_authorized_networks_config" {
    for_each = var.master_authorized_networks_config
    content {
      dynamic "cidr_blocks" {
        for_each = lookup(master_authorized_networks_config.value, "cidr_blocks", [])
        content {
          cidr_block   = cidr_blocks.value.cidr_block
          display_name = lookup(cidr_blocks.value, "display_name", null)
        }
      }
    }
  }

  ip_allocation_policy {
    cluster_secondary_range_name  = var.cluster_ip_range_pods
    services_secondary_range_name = var.cluster_ip_range_services
  }

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  node_pool {
    name               = "default-pool"
    initial_node_count = 1
  }
  remove_default_node_pool = true

  # We can optionally control access to the cluster
  # See https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters
  private_cluster_config {
    enable_private_endpoint = var.disable_public_endpoint
    enable_private_nodes    = var.enable_private_nodes
    master_ipv4_cidr_block  = var.master_ipv4_cidr_block
  }

  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  dynamic "authenticator_groups_config" {
    for_each = var.enable_authenticator_groups_config ? [1] : []
    content {
      security_group = var.authenticator_groups_config_security_group
    }
  }

  binary_authorization {
    evaluation_mode = var.enable_binary_authorization
  }

  pod_security_policy_config {
    enabled = var.enable_pod_security_policy_config
  }

  // Configure the workload identity "identity namespace".  Requires additional
  // configuration on the node pool for workload identity to function.
  // See PATT-24
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  addons_config {

    http_load_balancing {
      disabled = !var.http_load_balancing
    }

    horizontal_pod_autoscaling {
      disabled = !var.horizontal_pod_autoscaling
    }

    network_policy_config {
      disabled = !var.enable_network_policy
    }

    istio_config {
      disabled = !var.enable_managed_istio
      auth     = var.managed_istio_auth
    }

    config_connector_config {
      enabled = var.enable_config_connector
    }

  }

  network_policy {
    # Based on: https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/656#issuecomment-910108440
    # Enabling NetworkPolicy for clusters with DatapathProvider=ADVANCED_DATAPATH is not allowed (yields error)
    enabled = var.enable_dataplane_v2 ? false : (var.enable_network_policy ? true : false)

    # Tigera (Calico Felix) is the only provider
    # CALICO provider overrides datapath_provider setting, leaving Dataplane v2 disabled
    provider = var.enable_dataplane_v2 ? "PROVIDER_UNSPECIFIED" : (var.enable_network_policy ? "CALICO" : "PROVIDER_UNSPECIFIED")
  }

  # This is where Dataplane V2 is enabled.
  datapath_provider = var.enable_dataplane_v2 ? "ADVANCED_DATAPATH" : "DATAPATH_PROVIDER_UNSPECIFIED"

  dynamic "resource_usage_export_config" {
    for_each = var.enable_metering ? [1] : []

    content {
      enable_network_egress_metering       = true
      enable_resource_consumption_metering = true

      bigquery_destination {
        dataset_id = google_bigquery_dataset.cluster_resource_usage[0].dataset_id
      }
    }
  }

  lifecycle {
    ignore_changes = [node_pool]
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = var.maintenance_start_time
    }
  }

  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }
}

nemethloci commented 8 months ago

One more update: google support has confirmed, that it is expected, that both upon nodepool deletion and manual draining of all nodes of a nodepool, unless the autoscaler is disabled first, new nodes would be spinned of without being cordoned by the autoscaler. As such the 'workaround' process to be used is:

spin up new nodepool and at the same time disable autoscaling for the old
drain old nodepool nodes manually and once finished (ie: via kubectl)
delete the old nodepool

ggtisc commented 8 months ago

@nemethloci is this the same issue of the 17668?

melinath commented 8 months ago

Note from triage: this doesn't look like a duplicate - that issue was raised by the same author.

nemethloci commented 8 months ago

@nemethloci is this the same issue of the 17668?

Yes and no:

yes: both related to the same workflow (deleting/changing a nodepool)
no: The other item intends to solve one problem of many (by turning off autoscaling before deletion), while this one is about to document how deletion of a nodepool should be done in general due to that and other deficiencies, which can not be solved without google changing their APIs and behavior. If the other bugreport would be implemented, there would be one issue less to include in the updated documentation.

Why I have raised two tickets: when I originally reported the documentation related (this) ticket, I thought (based on some early feedback from google engineers), that google will admit, that this is a bug and they will fix it themselves, hence there would be no need to alter the terraform provider itself.

nemethloci commented 8 months ago

FYI: google have created a feature request (without any commitment with regard to timeline): https://issuetracker.google.com/issues/331403526

wyardley commented 1 month ago

I think https://github.com/GoogleCloudPlatform/magic-modules/pull/12014 will at least partially help, by allowing in-place updates (without recreating the node pool) of at least some of the fields in google_container_cluster.node_config

I didn't mark this as being "fixed by" that, because I'm not 100% sure it solves all cases and if it's the same underlying issue, but feel free to update that and / or comment here if you think that should fix it.

roaks3 commented 1 month ago

Agreed that in-place updates help, but there are still some ForceNew fields within node_config, so I think we should keep this open as a request to mention the dangers (like abrupt node termination) of changing those fields on this resource.

hashicorp / terraform-provider-google