hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.34k stars 1.74k forks source link

GKE Node Pool and Compute Backend Service association #9708

Open brianmori opened 3 years ago

brianmori commented 3 years ago

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files


resource "google_container_node_pool" "web-test" {
  name       = "web"
  cluster    = module.gke.cluster_id
  node_count = 0

  node_locations = [
    "europe-west3-c",
  ]

  node_config {
    machine_type    = "e2-micro"
    local_ssd_count = 0
    disk_size_gb    = 40
    disk_type       = "pd-standard"
    image_type      = "COS_CONTAINERD"
    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = data.google_service_account.gke.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

  }
}

output "web-test-url" {
  value = google_container_node_pool.web-test.instance_group_urls[0]
}

resource "google_compute_instance_group_named_port" "web_eu_we3_ext_http" {
  group = google_container_node_pool.web-test.instance_group_urls[0]
  zone  = "europe-west3-c"
  name  = "http-ext"
  port  = 30080
}

module "lb-ext-http" {
  source  = "GoogleCloudPlatform/lb-http/google"
  version = "6.0.1"

  ...

  backends = {
    default = {
      description = "default goes to api"
      protocol    = "HTTP"

      groups = [
        {
          # Each node pool instance group should be added to the backend.
          group                        = google_container_node_pool.web-test.instance_group_urls[0]
          balancing_mode               = null
          capacity_scaler              = null
          description                  = null
          max_connections              = null
          max_connections_per_instance = null
          max_connections_per_endpoint = null
          max_rate                     = null
          max_rate_per_instance        = null
          max_rate_per_endpoint        = null
          max_utilization              = null

        },

      ]

    },
  }

}

Expected Behavior

I solved with replace to remove Manage from

https://www.googleapis.com/compute/v1/projects/XXXXX/zones/europe-west3-c/instanceGroupManagers/gke-XXXXXX-75d39808-grp

to

https://www.googleapis.com/compute/v1/projects/XXXXX/zones/europe-west3-c/instanceGroups/gke-XXXXXX-75d39808-grp


        {
          group                        = replace(google_container_node_pool.web-a-prod.instance_group_urls[0], "Managers", "s")
          balancing_mode               = null
          capacity_scaler              = null
          description                  = null
          max_connections              = null
          max_connections_per_instance = null
          max_connections_per_endpoint = null
          max_rate                     = null
          max_rate_per_instance        = null
          max_rate_per_endpoint        = null
          max_utilization              = null
        },

I think it should either accept instanceGroupManagers or have an output with the instanceGroups URLs

Actual Behavior

      + backend {
          + balancing_mode  = "UTILIZATION"
          + capacity_scaler = 1
          + group           = "https://www.googleapis.com/compute/v1/projects/XXXXXX/zones/europe-west3-c/instanceGroups/gke-XXXXXX-75d39808-grp"
          + max_utilization = 0.8
        }

│ Error: Error updating BackendService "projects/XXXXXX/global/backendServices/lb-prod-ext-http-backend-default": googleapi: Error 400: Invalid value for field 'resource.backends[0].group': 'https://www.googleapis.com/compute/v1/projects/XXXXX/zones/europe-west3-c/instanceGroupManagers/gke-XXXXXX-75d39808-grp'. Unexpected resource collection 'instanceGroupManagers'., invalid

Steps to Reproduce

  1. terraform apply

Important Factoids

References

amaioli commented 3 years ago

👍

ScottSuarez commented 3 years ago

Could you clarify the configuration as to specifically what you are trying to provide another option to. I'm not very familiar with this resource so it's hard to understand the ask from the configuration and the followup.

brianmori commented 3 years ago

Could you clarify the configuration as to specifically what you are trying to provide another option to. I'm not very familiar with this resource so it's hard to understand the ask from the configuration and the followup.

Hi @ScottSuarez

I am not sure I understood :) I explain what is the objective of the issue

I wish to deploy GKE and the Google Load Balancer with Terraform. The goal is to expose web applications to the internet or internal network.

An approach is to use Ingress which provisions a LB.

I prefer to provision the Load Balancer with Terraform. In order to do that, I need to

  1. Create the Load Balancer
  2. Create the GKE cluster
  3. Create the instance group named port to allow the registration of the instance group to the load balancer

What this code does

google_container_node_pool.web-test.instance_group_urls[0]

is to take the node pool instance urls (a node pool can create different instance groups, one per AZ) and assign it as backend of the Load Balancer

The operation does not work because this value "instanceGroupManagers" is not accepted as Backend, it is accepted for the Named Port though, while this "instanceGroups" works

Let me know if I answered your question

ScottSuarez commented 3 years ago

cool ! I think I get the topic now.

google_container_node_pool exposes node pool urls but these are the managed instances. You want to expose the underlying instance beneath that? correct? or allow named port to accept either value.

For the first option we won't support it since [the api](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.zones.clusters.nodePools ) doesn't expose this. We normally go for parity on the api and need a very good reason to divert.

For the second option it is something we might consider. I've relabeled this as an enhancement request. We will talk about this in our triage and decide how to prioritize it. Given there is a workaround it probably will not take priority.

As a secondary solution to this you could ask the upstream api owners if they would accept a Managed reference within the set named ports call. This might be a cleaner approach. https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroups/setNamedPorts

brianmori commented 3 years ago

I do not want to expose single instances because these can be created and destroyed.

What I need to do is to "link" the GKE Node Pools Managed Instances Groups to the Load Balancer Backend.

This will also Internet Traffic to reach the GKE Node Pools through the Load Balancer by exposing the Kubernetes Node Port and the MIGs port

I think there is a bug because the output of this

output "web-test-url" {
  value = google_container_node_pool.web-test.instance_group_urls[0]
}

it does not return the instance group urls like

https://www.googleapis.com/compute/v1/projects/XXXXX/zones/europe-west3-c/instanceGroups/gke-XXXXXX-75d39808-grp

rather it returns

https://www.googleapis.com/compute/v1/projects/XXXXX/zones/europe-west3-c/instanceGroupManagers/gke-XXXXXX-75d39808-grp

the "Managers" is what is is causing the issue, as if I remove it with replace it works well Should I get in touch with Google team to clarify if it is an issue or it is a bug in the output of the terraform google provider?

ScottSuarez commented 3 years ago

So terraform is actually just an interface for the api. The api is field is surfacing the managed instance links.

https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.zones.clusters.nodePools

ScottSuarez commented 3 years ago

We'll talk about his issue in our triage

upodroid commented 3 years ago

A similar thing was asked a while back in #2373

Also, you want to use Kubernetes Service resource to create these loadbalancers instead of manipulating the loadbalancers components manually. Remember, the GKE nodes are managed instances that are blackboxes to endusers and you may have a poor experience with your approach.

You can create ServiceType Loadbalancer in Kubernetes and Google will provision a loadbalancer to expose that app properly to internet or an ILB.

Have a look at this article https://cloud.google.com/kubernetes-engine/docs/concepts/service#services_of_type_loadbalancer

brianmori commented 3 years ago

Apologies for the late reply and thank you for your proposal.

I explain the "why" we prefer to do with Terraform instead of GKE.

We provision our cloud infrastructure with Terraform Enterprise. Terraform Enterprise has a module called "Sentinel" which does checks what and how the services are provisioned.

Example:

1) Enforce Cloud Armor is active for all External LB 2) Enforce SSL Policy to allow only "Modern" and "Restricted"

While it is possible to implement reactive controls after the deployment has been done to detect deviations, proactive controls at provisioning time are more efficient to ensure rules are respected. Also I am afraid to let the developers have too much access to GKE underlying services and delete the load balancer by mistake.

upodroid commented 3 years ago

You can achieve that by using OPA Gatekeeper to reject ingresses/service that don't have the right set of annotations.

https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes/

There is an example from Google that uses Gatekeeper to enforce PSPs here. You can write a constaint that rejects Ingresses/Services missing these annotations.

https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#associating_frontendconfig_with_your_ingress

Also I am afraid to let the developers have too much access to GKE underlying services and delete the load balancer by mistake.

This risks exists and there is no meaningful way to avoid it. You are expected to use Kubernetes resources to deploy infrastructure.

brianmori commented 3 years ago

I was not aware of OPA Gatekeeper, thank you for sharing!

Some teams prefer to deploy and maintain the kubernetes by themselves, henceforth it is not possible to my knowledge put this "guardrail". This is the reason I preferred an equivalent AWS approach where Terraform deploys Load Balancer, Target Groups and Launch Template/AutoScaling Groups.

This risks exists and there is no meaningful way to avoid it. You are expected to use Kubernetes resources to deploy infrastructure.

I understand your point and I agree to the extent when it only impacts that specific GKE, as example engineer does an operational mistake and the load balancer is lost. If there is a security issue because the WAF was not applied and it impacts other workloads, it is a risk to be treated and enforced before the workload is even deployed.

I also saw this possibility: https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing I will give a look