hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

Support labels for instances in node pool #1758

Closed JanVoracek closed 9 months ago

JanVoracek commented 6 years ago

Terraform Version

Terraform v0.11.7
+ provider.google v1.15.0
+ provider.kubernetes v1.1.0

Affected Resource(s)

Please list the resources as a list, for example:

Terraform Configuration Files

resource "google_container_node_pool" "my_pool" {
  name       = "my-pool"
  cluster    = "my-cluster"
  node_count = "2"

  node_config {
    labels {
      "foo" = "bar"
    }
  }
}

We can specify Kubernetes labels for nodes in google_container_node_pool (using node_config.labels) but it's not possible to set labels for the instances themselves (similarly to google_compute_instance .labels).

b/299443048

dwradcliffe commented 6 years ago

Google does not currently provide any way to do this. I have requested this feature but it has not been done yet.

JanVoracek commented 6 years ago

It's probably a stupid question but why it's not possible if gcloud compute instances add-labels can do it?

dwradcliffe commented 6 years ago

Since the instances are managed by the node pool (and the underlying managed instance groups), you can't make changes directly to the instances (via the method you mention). Instead we must modify the node pool itself, and there's no API for setting resource labels on the node pool. 😞

paddycarver commented 6 years ago

Not a stupid question! As @dwradcliffe mentions, Terraform operates at the level of the node pool, and doesn't see/think about/control the nodes in the pool--that's GKE's area of responsibility. Node pools can also scale at any time, which means the nodes in question may change without Terraform running, and any new nodes wouldn't have the labels on them.

ianrose14 commented 5 years ago

Although the docs don't mention it, this seems to work for me in practice. I'm using:

Terraform v0.11.8
+ provider.google v1.19.0

It appears to support taints as well.

joostvdg commented 5 years ago

@ianrose14 can you share the code you used to set labels and taints that end up on the node instances?

ianrose14 commented 5 years ago

Here's one example

resource "google_container_node_pool" "name" {
  name       = "my-pool"
  cluster    = "${google_container_cluster.primary.name}"
  zone       = "${var.primary_zone}"
  node_count = 1

  autoscaling {
    min_node_count = 1
    max_node_count = 4
  }

  node_config {
    labels = {
      "cluster"                     = "${var.cluster_node_label}"
      "k8s.fullstory.com/node-task" = "my-value"
    }

    machine_type = "n1-standard-4"
    oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
    preemptible  = "true"

    taint = {
      effect = "NO_SCHEDULE"
      key    = "fs-node-use"
      value  = "my-value"
    }
  }
}
martinasia commented 5 years ago

If the goal is to labelled all the Nodes/VMs for that particular cluster, I normally set it up under "google_container_cluster" resource by defining the resource_labels argument. Not sure if you want to take this approach but for me this one works just like what I wanted (having custom labels for all my cluster nodes). Here's a piece of code example that I normally use :


resource "google_container_cluster" "primary" {
  name                          = "k8s-${terraform.workspace}-cluster"
  zone                          = "${var.region}-a"
  enable_legacy_abac            = false
  remove_default_node_pool      = true
  node_pool {
    name = "default-pool"
  }
  resource_labels               = "${var.resource_labels}"
}

# Labeling for billing purposes #
variable "resource_labels" {
  default = {
    environment = "development"
    maintainer  = "iamfiverrguy@gmail.com"
  }
  description = "Kubernetes cluster-wide resource labels"
}
red8888 commented 5 years ago

Here's one example

resource "google_container_node_pool" "name" {
  name       = "my-pool"
  cluster    = "${google_container_cluster.primary.name}"
  zone       = "${var.primary_zone}"
  node_count = 1

  autoscaling {
    min_node_count = 1
    max_node_count = 4
  }

  node_config {
    labels = {
      "cluster"                     = "${var.cluster_node_label}"
      "k8s.fullstory.com/node-task" = "my-value"
    }

    machine_type = "n1-standard-4"
    oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
    preemptible  = "true"

    taint = {
      effect = "NO_SCHEDULE"
      key    = "fs-node-use"
      value  = "my-value"
    }
  }
}

This can probably be closed right? labels under node_config works. applying resource labels at google_container_cluster does NOT work- it will not apply those labels to the GCE instances

martinasia commented 5 years ago

Well... I tested this just a while ago and I must say that adding resource_labels argument to google_container_cluster DOES work for me. I was also skeptical at first and hesitated when I tried this method last year (although the TF docs which I read says it very clearly) here: https://www.terraform.io/docs/providers/google/r/container_cluster.html#resource_labels , however, I went ahead anyway and gave it a test. Long story to short, I'm glad that I took that route since I eventually learned something about how google_container_cluster and that particular argument (resource_labels) works.

The very thing which I learned was, the (cluster-wide) label changes are not instantaneous! It took about roughly 30~45mins for me to see the changes propagated and reflected on ALL our GCE instances (nodes). If you have several node pools and you want to make one time shot of applying labels to the pools then this will make things easier for you (assuming that you have that level of patience required).

So...perhaps you could give this another try (by sparing some extra time to get the changes fully reflected across all the nodes) and share the result here @red8888 ? I'm sure your experience in testing this technique will benefit us all.

I also post some of the cli stdouts which I performed during the test where I added 'maintainer=Martin' as a resource_labels to my TF' vars file:

$ terraform -v
Terraform v0.12.1
+ provider.google v2.7.0
$ terraform plan
Terraform will perform the following actions:

  # module.gke.google_container_cluster.primary will be updated in-place
  ~ resource "google_container_cluster" "primary" {
        additional_zones         = []
        cluster_autoscaling      = []
        cluster_ipv4_cidr        = "10.20.0.0/14"
        ...
        master_version           = "1.12.7-gke.17"
        min_master_version       = "1.12.7-gke.17"
        monitoring_service       = "monitoring.googleapis.com"
        name                     = "k8s-dev-cluster"
        node_locations           = []
        node_version             = "1.12.7-gke.17"
        project                  = "xxx"
        remove_default_node_pool = true
      ~ resource_labels          = {
            "env"        = "staging"
          + "maintainer" = "martin"
            "resource"   = "gke"
        }
        ...

After roughly about an hour, I checked one of the nodes (GCE) for that cluster by issuing the following command:

$ gcloud compute instances describe <one of your nodes> --zone=<the gke zone> | head --lines=+50

canIpForward: true
cpuPlatform: Intel Broadwell
creationTimestamp: '2019-06-11T05:45:50.194-07:00'
deletionProtection: false
disks:
...
id: 'xxx'
kind: compute#instance
labelFingerprint: ptUBrXARDBY=
labels:
  env: staging
  goog-gke-node: ''
  maintainer: martin    <<<<<<<<<<
  resource: gke
...

Lastly, I've tested this and the resource argument worked as it should on both preemptible and non-preemptible instance type.

talonx commented 4 years ago

I'm curious to know why this is not supported yet with google_container_node_pool when the other similar module https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/9.1.0 supports this.

E.g. see the example at https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/v9.1.0/modules/beta-public-cluster

I would rather use the resource google_container_node_pool directly than the other module, because this is much more flexible.

rileykarson commented 4 years ago

@talonx: Those are Kubernetes labels as specified with node_config.labels and not GCP labels, which are not currently able to be set on node pools.

stangles commented 3 years ago

Chiming in to confirm that @martinasia's suggestion does in fact work. I conducted essentially the same experiement by applying resource_labels to my GKE cluster and after almost exactly one hour, the GCE VM now carries the label that I added:

stangles@stangles~$ gcloud compute instances describe <instanceId> | grep labels -A 5
labels:
  env: dev
  goog-gke-node: ''

I realize this is an old thread, however, I felt it was important to confirm the behavior described since it isn't officially documented.

red8888 commented 2 years ago

After years I rediscovered this thread. Is resource_labels supposed to propagate labels to GCE instances or not? Because, again, after applying I'm not seeing those labels on the GCE instances.

Does it take "one hour" for the labels to propagate OR does it only apply them to new nodes as they are created or something?

Like to know exactly how this is supposed to work.

rileykarson commented 2 years ago

resource_labels, the cluster-level field, applies its labels to the GCE resources managed by the cluster. At the very least, it does for the nodes (i.e. GCE VMs) and I'm not sure offhand if it does for Kubernetes-managed resources like load balancers.

The ask here as we've interpreted it is for that kind of management at the node pool level, rather than across the whole cluster- that way, two different pools could apply different sets of labels to their nodes. That would make a config like the following possible:

resource "google_container_cluster" "primary" {
  name     = "my-gke-cluster"
  location = "us-central1"

  resource_labels {
    cluster-label = "some-value"
  }

  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "preemptible_nodes" {
  name       = "my-node-pool-preemptible"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"
    resource_labels {
      preemptible-node-label = "some-other-value"
    }
  }
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "my-node-pool"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    machine_type = "e2-large"
    resource_labels {
      primary-node-label = "some-distinct-value"
    }
  }
}
carlos4ndre commented 2 years ago

@rileykarson It would be great to have support for resource_labels at the node pool level, as you highlighted in your example.

Looking at the REST API and SDK this feature doesn't seem to be supported yet, so at the moment the default behaviour is to set the cluster-wide resource_labels in all instance templates used by the node pools.

For reference, believe this feature is supported in EKS and AKS.

carlos4ndre commented 2 years ago

Having support for resource_labels at the node pool level is particularly useful for billing, as we can do cost allocation per Team by setting a team label.

NA2047 commented 2 years ago

Support was just added for for changes in tags at in the node_pool to not cause a complete recreation, only an in-place update

https://github.com/GoogleCloudPlatform/magic-modules/pull/6599

hope this will help some what

carlos4ndre commented 2 years ago

@NA2047 that unfortunately doesn't solve the issue, we still need to have support for labels in node pool instances as illustrated in the example above.

korenlev commented 1 year ago

If the node_pool is configured with auto scaling enabled then Update any metadata in node pool will FAIL with the following google error:

"Updates for 'labels' are not supported in node pools with autoscaling enabled (as a workaround, consider temporarily disabling autoscaling or recreating the node pool with the updated values.)"

You can publish this error back to the user, so he can decide to use this temp workaround by google or not. You can validate this with upstream (the relationship with auto scaling property).

hudac commented 9 months ago

Looking into the REST API, applying labels to a GCP node pool resource seems to be supported now:

  "resourceLabels": {
    string: string,
    ...
  },

Tried with the hashicorp/google v5.15.0 provider, adding node_config.resource_labels works:

$ terraform apply
...

  ~ resource "google_container_node_pool" "np" {

      ~ node_config {
          ~ resource_labels   = {
              + "team" = "dev"
              + "unit" = "rnd"
            }
            # (14 unchanged attributes hidden)

            # (2 unchanged blocks hidden)
        }
rileykarson commented 9 months ago

Ah, yep, seems this was resolved in https://github.com/GoogleCloudPlatform/magic-modules/pull/6842 and released in 4.45.0

github-actions[bot] commented 8 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.