google backend_service to container_node_pool and named ports

jpza commented 6 years ago

Hi there,

My issue is that there is no named_port param for google_container_node_pool and when I hook up a backend service to the node_pool (which is brand new) named_port has to be set manually from the UI. Any suggestions on how to do this from terraform?

Would exposing named_port on the google_container_node_pool resource and using that to set the google_compute_instance_group resources that are created be feasible?

Terraform Version

$ terraform -v Terraform v0.11.7

provider.aws v1.18.0
provider.google v1.12.0

Affected Resource(s)

google_container_node_pool
google_compute_backend_service

Terraform Configuration Files

resource "google_compute_backend_service" "redacted-service-backend" {
  name        = "redacted-backend-${var.environment}"
  port_name   = "redacted"
  protocol    = "HTTP"
  timeout_sec = 3

  backend {
    // group - (Required) The name or URI of a Compute Engine instance group
    // because of https://github.com/terraform-providers/terraform-provider-google/pull/1207
    // we can now use node-pools (k8s) with backends
    group = "${google_container_node_pool.redacted.instance_group_urls}"
  }

  health_checks = ["${google_compute_http_health_check.redacted-healthcheck.self_link}"]
}

and

resource "google_container_node_pool" "redacted-nodepool" {
  name       = "node-pool-redacted"
  zone       = "us-central1-f"
  cluster    = "${google_container_cluster.redacted.name}"
  node_count = 2

  node_config {
    preemptible  = false
    machine_type = "n1-standard-4"
    disk_size_gb = 10

    labels {
      app   = "redacted"
      env   = "${var.environment}"
    }
  }
}

Expected Behavior

I would expect google_container_node_pool to set named_port given that we can attach node-pools (which have instance_group urls) to a backend-service from both the UI and now terraform, but I checked the API (nodePools) and it's definition of node-pools do not take a named-port.

Actual Behavior

No ability to set named_port on google_container_node_pool

References

PR allowing backends to refer to node_pools:

https://github.com/terraform-providers/terraform-provider-google/pull/1207

Issue regarding portName in backend_serives:

https://github.com/terraform-providers/terraform-provider-google/issues/1043

emilymye commented 6 years ago

In general, if named_port isn't enabled in the API on the GKE resource, I'm not sure it makes sense to have it saved as a separate attribute on node pools in Terraform, or that it's the the right way. I'm not the most up-to-date on GKE, so I'll need some clarification - where are you setting the named port in the UI (i.e. on what resource)?

Do you have an ideal example config with barebones resources? Mostly, I want to see where you are actually defining your ports and where you want to reference the values.

jpza commented 6 years ago

@emilymye Thanks for responding,

named_port is generally set on google_compute_instance_group resources and utilized by the google_compute_backend_service via the port_name variable. It is now the case that you can associate _google_container_nodepool with a google_compute_backend_service resource but there is no way to setup the port_name mapping. You can successfully associate a k8s managed set of instance groups and a backend service via the GCP UI. Another common approach is utilizing a k8s created ingress controller which builds these GLB/k8s IG associations for you.

emilymye commented 6 years ago

Ah, I see. I'm not sure there's a good way to essentially import the created instance group to use as a resource currently. We could iterate through the node_pool.instance_group_urls and set a named port on each group, but we'd probably have to discuss how we could do that in the terraform workflow

jpza commented 6 years ago

@emilymye I think this is a common use-case for people using GCP. Since the terraform policy is not to implement things in beta (which for google means critical components) we have no ability to create ingress resources (kind: Ingress) via terraform other than by using forked versions of terraform which can be problematic.

Since everything I'm discussing here is considered stable (v1) and allows people to finally start automating k8s clusters and exposing them via https/http with terraform I think this feature should be considered high priority.

danawillow commented 6 years ago

I don't have anything new to add about named ports at the moment, but I wanted to pop in here and say that we don't have any policy not to implement things that are in beta; we have plenty of beta features in the GCP terraform provider! If there are any missing, feel free to file a feature request :)

jpza commented 6 years ago

@danawillow My mistake - I was referring to https://github.com/terraform-providers/terraform-provider-kubernetes/issues/14 which reflects the policy of the Kubernetes provider not GCP.

jpza commented 6 years ago

I'm now attempting to link a backend-service to a node-pool and recieving this error

Error: Error applying plan:

1 error(s) occurred:

* module.dev-google-infra.google_compute_backend_service.preview-service-back: 1 error(s) occurred:

* google_compute_backend_service.preview-service-back: Error creating backend service: googleapi: Error 400: Invalid value for field 'resource.backends[0].group': 'https://www.googleapis.com/compute/v1/projects/faketv-dev/zones/us-central1-f/instanceGroupManagers/gke-projectxyz-clust-node-pool-golddi-956ed4db-grp'. Unexpected resource collection 'instanceGroupManagers'., invalid

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

FAIL: 1

justinpotts@cathas: ~/go/src/github.com/faketv/projectxyz/terraform new_fastly_endpoints ⚡
$ terraform state show google_container_node_pool.projectxyz-nodepool                                      [12:48:27]
id                                       = us-central1-f/projectxyz-cluster/node-pool-projectxyz
cluster                                  = projectxyz-cluster
initial_node_count                       = 5
instance_group_urls.#                    = 1
instance_group_urls.0                    = https://www.googleapis.com/compute/v1/projects/faketv-dev/zones/us-central1-f/instanceGroupManagers/gke-projectxyz-clust-node-pool-golddi-956ed4db-grp
management.#                             = 1
management.0.auto_repair                 = false
management.0.auto_upgrade                = false
name                                     = node-pool-projectxyz
name_prefix                              =
node_config.#                            = 1
node_config.0.disk_size_gb               = 10
node_config.0.guest_accelerator.#        = 0
node_config.0.image_type                 = COS
node_config.0.labels.%                   = 3
node_config.0.labels.app                 = projectxyz
node_config.0.labels.env                 = dev
node_config.0.labels.group               = ri
node_config.0.local_ssd_count            = 0
node_config.0.machine_type               = n1-standard-4
node_config.0.metadata.%                 = 0
node_config.0.min_cpu_platform           =
node_config.0.oauth_scopes.#             = 7
node_config.0.oauth_scopes.1277378754    = https://www.googleapis.com/auth/monitoring
node_config.0.oauth_scopes.1632638332    = https://www.googleapis.com/auth/devstorage.read_only
node_config.0.oauth_scopes.172152165     = https://www.googleapis.com/auth/logging.write
node_config.0.oauth_scopes.2401844655    = https://www.googleapis.com/auth/bigquery
node_config.0.oauth_scopes.299921284     = https://www.googleapis.com/auth/bigtable.data
node_config.0.oauth_scopes.299962681     = https://www.googleapis.com/auth/compute
node_config.0.oauth_scopes.3067898161    = https://www.googleapis.com/auth/bigtable.admin
node_config.0.preemptible                = false
node_config.0.service_account            = default
node_config.0.tags.#                     = 0
node_config.0.taint.#                    = 0
node_config.0.workload_metadata_config.# = 0
node_count                               = 5
project                                  = faketv-dev
version                                  = 1.8.10-gke.0
zone                                     = us-central1-f

the resource config is:

resource "google_compute_backend_service" "preview-service-back" {
  name        = "projectxyz-backend-${var.environment}"
  port_name   = "projectxyz-preview"
  protocol    = "HTTP"
  timeout_sec = 10
  project     = "${var.gcp_proj}"

  backend {
    group = "${google_container_node_pool.projectxyz-nodepool.instance_group_urls.0}"
  }

  health_checks = ["${google_compute_http_health_check.preview-service-healthcheck.self_link}"]
}

The node-pool projectxyz-nodepool is created perfectly and should be able to be integrated with backend-service (since this can be done via the UI)

jpza commented 6 years ago

The above was fixed by issuing the replacement mentioned in https://github.com/hashicorp/terraform/issues/4336 but should be noted here as well

danawillow commented 6 years ago

@jpza mind filing that as a separate issue since it seems to be a different problem?

jpza commented 6 years ago

Bumping this. I can now successfully create a backend service linked to a google_container_node_pool, however, I still must set named-port mappings manually on the GCP UI. Until the named-port mapping is set the GLB will emit 502's because it cannot connect to the backend.

danawillow commented 6 years ago

I'd rather not add fields to the node_pool resource that aren't in the API, but I could see this potentially being implemented as a separate resource, which would be easier to add warnings about in the documentation. I haven't super thought this through yet, but that might be the best approach. Unfortunately, the terraform-provider-google maintainers have our hands very very full right now so none of us are going to be able to work on it in the near future, but if anyone external wants to pick this up, go for it :)

jsiebens commented 6 years ago

When searching for a solution for this issue, I also found this: https://github.com/bpineau/kube-named-ports It's another approach and I haven't tried it yet, but I think it can be useful

dinvlad commented 6 years ago

FWIW, there's an API call to set the port programmatically: https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroups/setNamedPorts . Not sure about Terraform, but I've used it from Deployment Manager successfully.

zheli commented 6 years ago

@dinvlad nice! There is a gcloud command for it also 😛

emilymye commented 6 years ago

@dinvlad Yup! The issue here is less being able to actually set named ports on instance groups and more finding the instance groups created for the GKE node pool by GCP (i.e. the node pool is managed by Terraform but the group is not) and calling that method on this instance group(s) with the correct named port

vranystepan commented 5 years ago

It seems they're using some shell-based module in the GCP examples. Well, it works but it's definitely less than desirable state 😀 https://github.com/GoogleCloudPlatform/terraform-google-lb-http/blob/master/examples/https-gke/gke-node-port/main.tf#L64

vranystepan commented 5 years ago

Also, it is possible to create similar workaround with

data "google_client_config" "default" {}

and access_token attribute which is Bearer token. However it's still a bit messy.

mkozjak commented 5 years ago

Managed to hit this case after trying to prepare GKE nodes to be able to receive traffic from a GCP Global HTTPS Load Balancer. Need to define named ports (or Port name mapping, as shown in GCP UI) after/at the time the google_container_cluster resource creates those nodes. There's a workaround solution here: https://github.com/danisla/terraform-google-named-ports, but I'd really love to see a proper solution for this. #110

rileykarson commented 5 years ago

Is this feature still needed by anyone? This isn't a feature of the API / of GKE node pools, and I think it'd be an anti-pattern to manage nodes directly instead of working through node pools or K8S. Terraform would only be able to update nodes on each Terraform run, and out-of-band scaling events in K8S would cause nodes to lose their special config.

I'm not sure that accessing nodes directly should be encouraged, but I'm curious what the use case for that is. Wouldn't you expect to access the IP of a K8S Pod or Service instead of the node?

In addition, not only is https://www.terraform.io/docs/providers/kubernetes/r/ingress.html surfaced now, but you can also use the helm provider apply K8S resources.

I haven't read it yet, but https://github.com/kubernetes/ingress-gce/issues/33 could be related to some use cases as well.

jianglai commented 5 years ago

The feature is necessary if one rely on L3/L4 TCP load balancing to expose services to the Internet. (I'm not sure if L7 HTTP load balancing + ingress has the same limitation). The way our service is set up, we have developments controlling pod replicas and expose the deployments with NodePort services. We then configure the GCP TCP proxy load balancer to forward inbound traffic from the Internet to the named ports on the GKE cluster. See here for details.

The way I see it, this is not a problem for k8s itself. k8s can only manage traffic that has hit its network, and what our team is trying to do is to have the GCLB forward outside traffic to the k8s cluster, and the way GCLB works requires a named port to be specified on the backend services.

So it would be very helpful for us to be able to load the clusters as Terraform resources and add the named ports, instead of using a bash script to do that after Terraform runs.

cristiklein commented 5 years ago

It's been a few months since I last worked with Terraform and GKE, but as far as I remember L7 HTTP load-balancing + ingress still has some limitations that require setting up a named_port. For example, you might want to configure a load-balancer that routes /api/v1 to one GKE cluster and /api/v2beta to another GKE cluster. IIRC, setting up an ingress essentially monopolizes the load-balancer to be used exclusively by a single cluster.

Would be great if named_port was a first-class citizen of Terraform, so as to avoid using the bash script that everyone is mentioning.

jaceq commented 4 years ago

Currently they way forward would be to use standalone NEGs as a direct replacement. On the other hand, my solution to this is this: -> after cluster creation I pull out instances from it -> I create a 'LB' instanceGroups and add existing hosts to them -> I add as many named ports as I need -> I use those as LB backend

This is done directly in terraform, no extra scripting around etc. Downside is that if host makes auto upgrade or what ever and falls out from instance group (then a terraform apply fixes it but still)

jianglai commented 4 years ago

@jaceq Can you explain what NEGs are? Too many acronyms to keep track of...

jaceq commented 4 years ago

@jianglai NEGs are Network Endpoint Groups, you can use them directly as Load Balancer backend. It is gerally described here: https://cloud.google.com/blog/products/containers-kubernetes/container-native-load-balancing-on-gke-now-generally-available

and here: https://cloud.google.com/load-balancing/docs/negs/

jianglai commented 4 years ago

@jianglai NEGs are Network Endpoint Groups, you can use them directly as Load Balancer backend. It is gerally described here: https://cloud.google.com/blog/products/containers-kubernetes/container-native-load-balancing-on-gke-now-generally-available

and here: https://cloud.google.com/load-balancing/docs/negs/

That’s interesting news! Thanks for sharing. And Terraform can manage NEGs directly I assume?

rileykarson commented 4 years ago

I'm triaging this as a new resource, adding a resource like google_compute_instance_group_named_port (and google_compute_region_instance_group_named_port I'd imagine) that can expose a named port on the underlying instance groups managed by GKE's node pools. It'll fall under our team's regular triage process, so 👍 reactions to the parent post will help get it picked up sooner.

jacobstr commented 4 years ago

How does this interact with google_container_node_pool? The underlying instance groups created by google_container_node_pool seem intentionally opaque, though tricks like the string substitution mentioned in https://github.com/hashicorp/terraform/issues/4336 seem to offer some form of workaround.

I'd love to see an example of the proposed resources + a google_container_node_pool resource.

Looking at the documented API for the NodePool's NodeConfig: v1 as well as v1beta - there is no way to pass named ports to container node pools.

You can do set it on the InstanceGroupManager but that's part of the compute engine API.

hawksight commented 4 years ago

I think I've had this issue before as well. I build our load balancer with terraform as it allows me to use the same load balancer for all ingress, and even potentially across clusters if needed.

I have to remember for each new cluster to go and set the named ports and port number values on each instance group, to match the k8s nodePort service I create inside the cluster. Otherwise the google_compute_backend_services just show up as not ready, until i remember to go run a script.

Raised an issue with google previously here as I assumed that if you create a loadbalancer service in GKE, surely it goes and sets the named_ports on the underlying instance groups for you? (untested) I thought the ports should also propagate when a nodePort service was defined.

Also not tried NEGs as they weren't around when I was looking at the issue.

Would love to see something in google_container_node_pool that would allow named_ports to be configured across all the google_compute_instance_groups that it creates under the hood.

rileykarson commented 4 years ago

@jacobstr: I see this as being similar to google_compute_network_peering_routes_config

# This resource would manage the _single_ port, in case GKE adds others out-of-band
resource "google_compute_instance_group_manager_named_port" "igm_named_ports" {
  instance_group_manager = google_container_node_pool.primary_preemptible_nodes.instance_group_urls[0]

  named_port {
    name = "gameServer"
    port = 25565
  }
}

resource "google_container_cluster" "primary" {
  name     = "my-gke-cluster"
  location = "us-central1"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = "my-node-pool"
  location   = "us-central1"
  cluster    = google_container_cluster.primary.name
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "n1-standard-1"

    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
}

With the caveats that it's a Terraform-applied value on a GKE-managed resource, and that it's entirely possible (but implausible) that GKE will recreate the manager, or modify the value.

@hawksight: It's been a while since I had complete context on what GKE does, but I believe GKE will create them for GKE managed services, particularly Ingress. Connecting GKE services to existing LBs remains a pain point to my knowledge. NEGs (Container-native load balancing) were added as an annotation & help some mixed GKE / external LB use cases, but since the NEG names are randomly generated, they're hard to use with Terraform in practice.

Alfablos commented 4 years ago

Hi Guys, I've got the same problem. This is so frustrating, we're not able to fully automate the deployment of a multi-source (meaning different backend types) load balancer. If we had the chance to set the named_port property in the node pool OR edit the instance group consequently we could automate an entire deployment process using the google provider along with the kubernetes one.

Any effort in this respect would be soooooo much appreciated. Thank you Antonio

Alfablos commented 4 years ago

So, here's a dirty little trick I came up with as a workaround. I'm posting it here in case anyone needs to automate this process and has hit this problem.

Say you have a variables.tf entry like this:

variable "loadbalancers" {
  default = {
    lb1 = {
      backends = {
        test1 = {
          description                     = "service1"
          protocol                        = "HTTP"
          port                            = 30001
          port_name                       = "port1"
            ...
          url_map                         = "config"
          health_check = {
            check_interval_sec  = 70
                ...
            request_path        = "/health"
          }
          balancing_mode = "UTILIZATION"
          capacity_scaler = 100
          max_utilization = 80
        }
        test2 = {
          description                     = "service2"
          protocol                        = "HTTP"
          port                            = 32120
          port_name                       = "port2"
            ...
          url_map                         = "config"
          health_check = {
            check_interval_sec  = 10
            request_path        = "/healthz"
          }
          balancing_mode = "UTILIZATION"
          capacity_scaler = 100
          max_utilization = 80
        }
      }
    }
  }
}

Say you have a node pool called "main_nodepool". We export the node pool instance group data :

data "google_compute_instance_group" "main_nodepool" {
  self_link = google_container_node_pool.main_nodepool.instance_group_urls[0]
}

Now we can access the instance group name using:

data.google_compute_instance_group.main_nodepool.name

We're going to use a local-exec provisioner to invoke gcloud to set the named ports. But first we need to list them! The gcloud command uses comma-separated values like this:

--named-ports port1:30001,port2:32120,port3:30557

So we need to wrap that argument in a variable:

locals {
  ig_named_ports = join(",", [ for backend in var.loadbalancers.lb1.backends : "${backend.port_name}:${backend.port}" ])
}

This iterates over backends and populates a list with "port1:30001" (for example) strings, then it joins all the elements of the list into a single string

Now, let's configure the first backend:

resource "google_compute_backend_service" "backend1" {
  name                  = var.loadbalancers.lb1.backends.test1.description
  health_checks         = [google_compute_health_check.healthcheck1.self_link]
  port_name             = var.loadbalancers.lb1.backends.test1.port_name
  ...
  backend {
    balancing_mode = var.loadbalancers.lb1.backends.test1.balancing_mode
    group          = data.google_compute_instance_group.main_nodepool.self_link
  }
  provisioner "local-exec" {
    command = "gcloud compute instance-groups set-named-ports ${data.google_compute_instance_group.main_nodepool.name} --named-ports ${local.ig_named_ports} --zone ${var.zone} --project ${var.project}"
  }
}

During the deploy phase the provisioner runs the gcloud command to set the named ports after the first backend is created. Note that you cannot use the gcloud command to set instances groups named ports more than once with different --named-ports arguments, since each successive command is not additive, it will replace the previously specified ports. You need to group all the ports you need first (using the join function, as shown above)!

Warning: I chose to manage single backends so I included the provisioner in the first one. Another way of managing things would be using for_each to create backends and a "null_resource" to run the provisioner.

After that we can go on with others backends, if you don't choose the for_each way, like:

resource "google_compute_backend_service" "backend2" {
  name                  = var.loadbalancers.lb1.backends.test2.description
  health_checks         = [google_compute_health_check.healthcheck2.self_link]
  port_name             = var.loadbalancers.lb1.backends.test2.port_name
  ...
  backend {
    balancing_mode = var.loadbalancers.lb1.backends.test2.balancing_mode
    group          = data.google_compute_instance_group.main_nodepool.self_link
  }
}

It works fine but it's dirty, although suitable for CD. Please, give us something more "native". Thanks Antonio

Alfablos commented 4 years ago

So, here's a dirty little trick I came up with as a workaround. I'm posting it here in case anyone needs to automate this process and has hit this problem.

Say you have a variables.tf entry like this:
variable "loadbalancers" {
  default = {
    lb1 = {
      backends = {
        test1 = {
          description                     = "service1"
          protocol                        = "HTTP"
          port                            = 30001
          port_name                       = "port1"
            ...
          url_map                         = "config"
          health_check = {
            check_interval_sec  = 70
                ...
            request_path        = "/health"
          }
          balancing_mode = "UTILIZATION"
          capacity_scaler = 100
          max_utilization = 80
        }
        test2 = {
          description                     = "service2"
          protocol                        = "HTTP"
          port                            = 32120
          port_name                       = "port2"
            ...
          url_map                         = "config"
          health_check = {
            check_interval_sec  = 10
            request_path        = "/healthz"
          }
          balancing_mode = "UTILIZATION"
          capacity_scaler = 100
          max_utilization = 80
        }
      }
    }
  }
}
Say you have a node pool called "main_nodepool". We export the node pool instance group data :
data "google_compute_instance_group" "main_nodepool" {
  self_link = google_container_node_pool.main_nodepool.instance_group_urls[0]
}
Now we can access the instance group name using:

data.google_compute_instance_group.main_nodepool.name

We're going to use a local-exec provisioner to invoke gcloud to set the named ports. But first we need to list them! The gcloud command uses comma-separated values like this:

--named-ports port1:30001,port2:32120,port3:30557

So we need to wrap that argument in a variable:
locals {
  ig_named_ports = join(",", [ for backend in var.loadbalancers.lb1.backends : "${backend.port_name}:${backend.port}" ])
}
This iterates over backends and populates a list with "port1:30001" (for example) strings, then it joins all the elements of the list into a single string

Now, let's configure the first backend:
resource "google_compute_backend_service" "backend1" {
  name                  = var.loadbalancers.lb1.backends.test1.description
  health_checks         = [google_compute_health_check.healthcheck1.self_link]
  port_name             = var.loadbalancers.lb1.backends.test1.port_name
  ...
  backend {
    balancing_mode = var.loadbalancers.lb1.backends.test1.balancing_mode
    group          = data.google_compute_instance_group.main_nodepool.self_link
  }
  provisioner "local-exec" {
    command = "gcloud compute instance-groups set-named-ports ${data.google_compute_instance_group.main_nodepool.name} --named-ports ${local.ig_named_ports} --zone ${var.zone} --project ${var.project}"
  }
}
During the deploy phase the provisioner runs the gcloud command to set the named ports after the first backend is created. Note that you cannot use the gcloud command to set instances groups named ports more than once with different --named-ports arguments, since each successive command is not additive, it will replace the previously specified ports. You need to group all the ports you need first (using the join function, as shown above)!

Warning: I chose to manage single backends so I included the provisioner in the first one. Another way of managing things would be using for_each to create backends and a "null_resource" to run the provisioner.

After that we can go on with others backends, if you don't choose the for_each way, like:
resource "google_compute_backend_service" "backend2" {
  name                  = var.loadbalancers.lb1.backends.test2.description
  health_checks         = [google_compute_health_check.healthcheck2.self_link]
  port_name             = var.loadbalancers.lb1.backends.test2.port_name
  ...
  backend {
    balancing_mode = var.loadbalancers.lb1.backends.test2.balancing_mode
    group          = data.google_compute_instance_group.main_nodepool.self_link
  }
}
It works fine but it's dirty, although suitable for CD. Please, give us something more "native". Thanks Antonio

I've come up with another solution, which is more general in fact. What happens if you need to deploy a REGIONAL cluster?? Well, as a matter of fact you'll end up dealing with two problems:

google_container_node_pool.main_nodepool.instance_group_urls will be a list with more than one string which, again, contains the instanceGroupManagers URLs, not the instanceGroup URLs. It'll contain as many strings as the selected zones within the region are. Using [0] in the data block will let you manage one single instance group.
Say you have your zone set to europe-west1-b but you're managing a multi-zone cluster within the region. You actually want to write something like:
```
data "google_compute_instance_group" "google_compute_instance_group_nodepool" {
for_each = toset(google_container_node_pool.nodepool.instance_group_urls)
self_link = each.value.self_link
}
```
You'll find out that you'll get just one self_link. And if you try to debug with:
```
output "debug" {
value = data.google_compute_instance_group.google_compute_instance_group_nodepool
}
```
You'll get a map containing a key with one non-empty object (relating to the instance group in europe_west1-b) and one or more, depending of what other zones you chose, keys containing empty objects. I'm not sure why this happens, I've "location = var.region" set on both the cluster and on the node pool.

One thing you could be tempted to do is manipulating the strings containing the instanceGroupManagers to replace(google_container_node_pool.main_nodepool.instance_group_urls, "instanceGroupManagers", "instanceGroup" or trim(google_container_node_pool.main_nodepool.instance_group_urls, "Manager") in the self_link within data. Don't do it: you'll end up with a 400 error code. Terraform (Google) will tell you that you've called the instanceGroupManager like an instanceGroup and so "not found". The worst thing about it is that it's going to be 400 until you delete the cluster, you'll be stuck. I actually don't understand why, since the self_link in the result of the data block using my automated solution is a self_link which is the same as if you applied this trick. I can't further investigate and I won't since I think my method is quicker.

So, enough warning and problems, let's head to the solution: We won't mess with URLs, we'll provide name and zone for each instance group, which has the same name as the manager. Our data block will be:

data "google_compute_instance_group" "google_compute_instance_group_nodepool" {
  for_each = toset(google_container_node_pool.nodepool.instance_group_urls)
  name = regex("gke.+", "${ each.value }")
  zone = regex("[a-z]+-[a-z]{4}[0-9]-[a-z]", "${ each.value }" )
}

It's a little dirty but it basically uses for_each to create as many data blocks as many instanceGroups the node pool has created. You don't need to know in advance how many node_locations your region has and you don't need to specify those if you don't want to restrict location compared to the full region (like using only two locations in a region which has 3).

"gke.+" will ensure to select only the instance group name from each URL.
"[a-z]+-[a-z]{4,5}[0-9]-[a-z]" will match any continent ([a-z]+), any cardinal points (north, south, west, east), any single-cipher numbers and any letters. All separated by a dash.

Providing a name and a zone won't make the data block point to a potentially "wrong" URL.

Now, the backend part:

resource "google_compute_backend_service" "backend1" {
  name                  = var.loadbalancers.lb1.backends.test1.description
  health_checks         = [google_compute_health_check.healthcheck1.self_link]
  load_balancing_scheme = "EXTERNAL"
  enable_cdn            = false
  port_name             = var.loadbalancers.lb1.backends.test1.port_name

  dynamic "backend" {
    for_each = data.google_compute_instance_group.google_compute_instance_group_nodepool
    content {
      balancing_mode = var.loadbalancers.lb1.backends.test1.balancing_mode
      group = backend.value.self_link
    }
  }
}

This will create a single backend service which includes as many backends as many instance groups we have. Of course for a single NodePort service on GKE. It uses dynamic blocks, so be sure to use "backend" instead of "each" within the for_each.

I had set up a gcloud command which updated an instance group with the appropriate named_ports, but now we have multiple instances, so first let's concatenate commands using " && " (the shell will appreciate that). I chose to set it as a variable, but there's no need to do that.

locals {
  ig_named_ports = join(",", [for backend in var.loadbalancers.test.backends : "${backend.port_name}:${backend.port}"])
  commands = tostring(join(" && ", [ for ig in data.google_compute_instance_group.google_compute_instance_group_nodepool : "gcloud compute instance-groups set-named-ports ${ig.name} --named-ports ${local.ig_named_ports} --zone ${ig.zone} --project ${var.project}"]))
}

resource "null_resource" "named_ports" {
  provisioner "local-exec" {
    command = local.commands #gcloud compute instance-groups set-named-ports ${data.google_compute_instance_group.main_nodepool.name} --named-ports ${local.ig_named_ports} --zone ${var.zone} --project ${var.project}"
  }
}

or:

resource "null_resource" "named_ports" {
  provisioner "local-exec" {
    #tostring is optional
    command = tostring(join(" && ", [ for ig in data.google_compute_instance_group.google_compute_instance_group_nodepool : "gcloud compute instance-groups set-named-ports ${ig.name} --named-ports ${local.ig_named_ports} --zone ${ig.zone} --project ${var.project}"]))
  }
}

I chose to take data (zone and name) from the data block, given what happened when I tried to bypass it. This is redundant, but it's for precaution.

Guys, I hope my experience will help you until the issue makes its way to Hashicorp developers :)

Alfablos commented 4 years ago

Hi guys, it's me again. I've found out an interesting fact. In the google_container_cluster terraform documentation we can read that instance_groups_is are exported, BUT when I read it a big "what if" popped up in my mind.

Let's say we want to deploy a big application with maybe two kinds of backends: one is a light and not so CPU intensive service, and the other is a heavy duty monster. It would make sense to create 2 node pools, one will host the first backend and the other the other one. Let's also state that the first backend is set to receive calls on port XXXXX (which we'll call "light" later on as named port, while the other will listen on ports YYYYY and ZZZZZ (which we'll call "heavy"). We're still talking about a regional cluster with replicas in 2 zones (say we don't need the third zone of the region because it's too expensive).

Now, let's investigate the output of the cluster as far as instance groups are concerned:

output "google_compute_instance_group_cluster" {
  value = google_container_cluster.cluster.instance_group_urls
}

which returns:

google_compute_instance_group_cluster = [
  "https://www.googleapis.com/compute/beta/projects/project/zones/zone1/**instanceGroups**/np1_ig1_name",
  "https://www.googleapis.com/compute/beta/projects/project/zones/zone2/**instanceGroups**/np1_ig2_name",
  "https://www.googleapis.com/compute/beta/projects/project/zones/zone1/**instanceGroups**/np2_ig1_name",
  "https://www.googleapis.com/compute/beta/projects/project/zones/zone2/**instanceGroups**/np2_ig2_name",
]
# ig = instance group
# np = node pool

As you can see ALL URLs of the instance groups related to the CLUSTER are exported.

Let's investigate the node pools:

output "google_compute_instance_group_nodepools_data" {
  value = data.google_compute_instance_group.google_compute_instance_group_nodepool
}
output "google_compute_instance_group_nodepool2_data" {
  value = data.google_compute_instance_group.google_compute_instance_group_nodepool2
}

We'll see that we'll get:

google_compute_instance_group_nodepools_data = {
  "https://www.googleapis.com/compute/v1/projects/project/zones/zone1/**instanceGroupManagers**/np1_igM1_name" = {
    ...
    "self_link" = "https://www.googleapis.com/compute/v1/projects/project/zones/zone1/**instanceGroup**/np2_ig1_name"
    ...
}

 "https://www.googleapis.com/compute/v1/projects/project/zones/zone2/**instanceGroupManagers**/np1_igM2_name" = {
    ...
    "self_link" = "https://www.googleapis.com/compute/v1/projects/project/zones/zone2/**instanceGroup**/np2_ig1_name"
    ...
}

google_compute_instance_group_nodepool2_data = {
  "https://www.googleapis.com/compute/v1/projects/project/zones/zone1/**instanceGroupManagers**/np2_igM1_name" = {
    ...
    "self_link" = "https://www.googleapis.com/compute/v1/projects/project/zones/zone1/**instanceGroup**/np2_ig1_name"
    ...
}
"https://www.googleapis.com/compute/v1/projects/project/zones/zone2/**instanceGroupManagers**/np2_igM2_name" = {
    ...
    "self_link" = "https://www.googleapis.com/compute/v1/projects/project/zones/zone2/**instanceGroup**/np2_ig1_name"
    ...
}

So, instead of using the ugly:

data "google_compute_instance_group" "google_compute_instance_group_nodepool" {
  for_each = toset(google_container_node_pool.nodepool.instance_group_urls)
  name = regex("gke.+", "${ each.value }")
  zone = regex("[a-z]+-[a-z]{4,5}[0-9]-[a-z]", "${ each.value }" )
}

we could just:

data "google_compute_instance_group" "google_compute_instance_group_nodepool" {
  for_each = toset(google_container_node_pool.nodepool.instance_group_urls)
  self_link = each.value.self_link
}

So, what I wrote in my previous post is inaccurate. BUT Here's the deal:

If we chose to export the instance groups directly from the cluster we have no way to only set the strictly needed named ports, be cause we can only act on all of them as a whole.
If we want to differenciate between the node pools we can set --named-ports light:XXXXX for the first 2 instance groups and --named-ports heavy1:YYYYY,heavy2:ZZZZZ for the other node pool (using the gcloud command within the local-exec" using 2 different data blocks AND set up 2 different backends (which is more accurate) as seen previously.

Sorry for not realizing that a .self_link could be called in the instancegroupmanagers. I hope this will help someone :) Bye

rileykarson commented 4 years ago

Alright, this resource is expected for our March 30th release. Config looks like this (for_each /count can be used to provide the group param):

resource "google_compute_instance_group_named_port" "my_port" {
  group = google_container_cluster.my_cluster.instance_group_urls[0]
  zone = "us-central1-a"
  name = "http"
  port = 8080
}

resource "google_compute_instance_group_named_port" "my_ports" {
  group = google_container_cluster.my_cluster.instance_group_urls[0]
  zone = "us-central1-a"
  name = "https"
  port = 4443
}

resource "google_container_cluster" "my_cluster" {
  name               = "my-cluster"
  location           = "us-central1-a"
  initial_node_count = 1
  network    = google_compute_network.container_network.name
  subnetwork = google_compute_subnetwork.container_subnetwork.name
  ip_allocation_policy {
    cluster_ipv4_cidr_block  = "/19"
    services_ipv4_cidr_block = "/22"
  }
}

jianglai commented 4 years ago

This is great! Thank you for finally making this possible!

quentinleclerc commented 4 years ago

Hello,

Thanks to you guys for the work and solutions.

About @Alfablos workaround and @rileykarson solution, in both cases you use for_each (or tell to use for each) but it's impossible to correctly use for_each with instance groups. Indeed we get Instance Groups URL from a data block or from the output of the node pool, and it's impossible to do a for_each on "non-known values" with Terraform (meaning we get the list of URLs AFTER the apply, which is incompatible with for each).

Anyone as a workaround of this, or maybe I'm missing something?

rileykarson commented 4 years ago

I wasn't actually aware of that restriction, that's frustrating. There isn't anything we can do in the provider about it, unfortunately- the restriction comes from Terraform Core.

You could do a partial apply with -target, creating the GKE cluster first, or split your configs and get the instance group url values off of a cluster datasource (which I believe will count as a known value).

hawksight commented 4 years ago

@quentinleclerc - I have had this issue when trying to pass around the instance group urls from the output of my cluster module into my load balancer module. Terraform didn't know the contents of the output variable, therefore could not be planned properly in my load balancer module.

In my case as described here I removed the output from my cluster module, which was based on a data lookup anyway. I put the data lookup inside my load balancer module. Terraform seemed to be able plan and execute that properly as it must have known the urls at the point.

Not sure I have a good answer here, but assuming your cluster build is inside a module. Could you run it as normal, from your mina.tf, then lookup the cluster instance_groups with a data lookup and then do the for_each on the google_compute_instance_group_named_port resource? So not include it inside your module currently, but as some appended tasks to run through, once the module completes.

quentinleclerc commented 4 years ago

@hawksight I'm not sure to understand your solution. Where you have a for_each based on a "dynamic" variable the plan won't work, being in the module or not. In your case you suggest using a data module like this ?

data "google_compute_instance_group" "gke_instance_groups" {
  for_each = toset(module.gke.instance_group_urls)
  self_link = each.value.self_link
}

If yes it'll still crash (module.gke.instance_group_urls being the output of the node pool, not known before apply). Or do you use a "trick" with regex etc... to get the instance groups?

hawksight commented 4 years ago

I may not have had the exact same issue, as my cluster was actually built, and in my case I just needed to get the up to date instance groups to the load balancer module. Have not tried a full stack rebuild with my changes just yet.

But here is my example code from the top of my load balancer module.. which builds after my cluster module.

data "google_container_cluster" "hack" {
  name    = var.cluster
  zone    = var.cluster_zone
  project = var.project
}

locals {
  active_backends = setsubtract(
    toset(data.google_container_cluster.hack.instance_group_urls),
    toset(var.exclude_group)
  )
}

In your case, after your cluster is built. Something like the following might work (untested of the to of my head code - may need edits):

data "google_container_cluster" "groups" {
  name    = var.cluster
  zone    = var.cluster_zone
  project = var.project
}

locals {
  instance_groups = toset(data.google_container_cluster.groups.instance_group_urls)
}

resource "google_compute_instance_group_named_port" "my_ports" {
  for_each = local.instance_groups
  group    = each.value
  zone     = "us-central1-a"
  name     = "https"
  port     = 4443
}

Obviously you may want to lookup the zone (if needed) and port values from somewhere else. I think it will work better if this is executed after your cluster build code. Maybe a separate module, that's what worked for the load balancer config that was dependent on the full list of instance_groups.

(I have not used this resource yet, so this is my best guess and hope it helps)

rileykarson commented 4 years ago

you may want to lookup the zone (if needed)

The zone will be inferred off group if possible. Precedence should go zone field > group field > provider, I believe.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

hashicorp / terraform-provider-google