wandb / terraform-google-wandb

A Terraform module for deploying Weights & Biases on GCP.
Apache License 2.0
12 stars 6 forks source link

fix: Update node pool image type #46

Closed flamarion closed 1 year ago

flamarion commented 1 year ago

The terraform plan:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.wandb.module.app_gke.google_container_node_pool.default will be updated in-place
  ~ resource "google_container_node_pool" "default" {
        id                          = "projects/tmp-terraform-permissions/locations/europe-west2-a/clusters/tf-perms-gcp-cluster/nodePools/default-pool-charming-wildcat"
        name                        = "default-pool-charming-wildcat"
        # (10 unchanged attributes hidden)

      ~ node_config {
          ~ image_type        = "COS" -> "COS_CONTAINERD"
            tags              = []
            # (14 unchanged attributes hidden)

            # (1 unchanged block hidden)
        }

        # (3 unchanged blocks hidden)
    }

  # module.wandb.module.gke_app.kubernetes_service.service will be updated in-place
  ~ resource "kubernetes_service" "service" {
        id                     = "default/wandb"
        # (2 unchanged attributes hidden)

      ~ metadata {
          ~ annotations      = {
              - "cloud.google.com/neg" = jsonencode(
                    {
                      - ingress = true
                    }
                ) -> null
            }
            name             = "wandb"
            # (5 unchanged attributes hidden)
        }

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

After apply it takes around 20 minutes to reconfigure the node pool in all my tests

module.wandb.module.app_gke.google_container_node_pool.default: Still modifying... [id=projects/tmp-terraform-permissions/loca...odePools/default-pool-charming-wildcat, 20m40s elapsed]
module.wandb.module.app_gke.google_container_node_pool.default: Still modifying... [id=projects/tmp-terraform-permissions/loca...odePools/default-pool-charming-wildcat, 20m50s elapsed]
module.wandb.module.app_gke.google_container_node_pool.default: Still modifying... [id=projects/tmp-terraform-permissions/loca...odePools/default-pool-charming-wildcat, 21m0s elapsed]
module.wandb.module.app_gke.google_container_node_pool.default: Still modifying... [id=projects/tmp-terraform-permissions/loca...odePools/default-pool-charming-wildcat, 21m10s elapsed]
module.wandb.module.app_gke.google_container_node_pool.default: Still modifying... [id=projects/tmp-terraform-permissions/loca...odePools/default-pool-charming-wildcat, 21m20s elapsed]
module.wandb.module.app_gke.google_container_node_pool.default: Modifications complete after 21m28s [id=projects/tmp-terraform-permissions/locations/europe-west2-a/clusters/tf-perms-gcp-cluster/nodePools/default-pool-charming-wildcat]
module.wandb.module.gke_app.kubernetes_service.service: Modifying... [id=default/wandb]
module.wandb.module.gke_app.kubernetes_service.service: Modifications complete after 1s [id=default/wandb]

Apply complete! Resources: 0 added, 2 changed, 0 destroyed.

The process is basically to replace the nodes in the node pool and recreate the pod with the new image type (COS_CONTAINERD), and the pod recreation is the only moment of downtime, which is very similar to a regular W&B upgrade.

jsbroks commented 1 year ago

This should be a fix type

jsbroks commented 1 year ago

This PR is included in version 1.12.4 :tada: