terraform-google-modules / terraform-google-kubernetes-engine

Configures opinionated GKE clusters
https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google
Apache License 2.0
1.15k stars 1.17k forks source link

Replace hub module with native resources #860

Open bharathkkb opened 3 years ago

bharathkkb commented 3 years ago

We should replace provisioners in our module now that there is native support https://github.com/GoogleCloudPlatform/magic-modules/pull/4600

ferrarimarco commented 3 years ago

Semi-related issue: the registration scripts have dependencies on bash (probably easy to remove) and gcloud (hard to remove).

Those dependencies make running a containerized Terraform instance harder. (i.e. you cannot use the hashicorp/terraform image as it is).

bharathkkb commented 3 years ago

@ferrarimarco happy to review a PR to replace the resources with native ones

ferrarimarco commented 3 years ago

Sure. I may be able to help, but I think we need to design this a bit, it looks a bit wider-scoped than my first PR (#886). 😄

Here's an example that uses the native gke_hub_membership (note that for module.gke.cluster_id to work, you need at least 14.3.0 of the kubernetes-engine module).

resource "google_gke_hub_membership" "cluster-hub-membership" {
  membership_id = "${module.gke.location}-${module.gke.name}"
  project       = google_project.tutorial_project.project_id
  provider      = google-beta

  authority {
    issuer = "https://container.googleapis.com/v1/${module.gke.cluster_id}"
  }

  endpoint {
    gke_cluster {
      resource_link = "//container.googleapis.com/${module.gke.cluster_id}"
    }
  }
}

If I'm not mistaken, the native resource relies on workload identity, more than using the (downloaded) key of a service account. This might introduce a breaking change with the existing module. Is forward compatibility something we should aim for? In terms of semantic versioning, users might expect that because the hub module has a release tag (same as the kubernetes-engine module) which might imply some stability.

walkernobrien commented 3 years ago

I am currently experiencing issues with getting this TF native resource to work. I had been using the Google CFT module for anthos registration and it worked fine, but in order to remove the gcloud dependency I've been trying to switch over.

When using the TF resource I notice that the TF runs successfully with no errors.... but the gke-connect-agent deployment does not exist on the GKE cluster, and the gke-connect namespace does not exist either. As a result the Anthos dashboard is showing me errors saying "Unable to reach Connect Agent".

QUESTIONS: Should this TF native resource be deploying the connect agent the same way the CFT module does? Do I need to create any Workload Identity mappings to allow this to work? Where (if anywhere) can I see logs/code that describe what K8s resources get applied (or fail to get applied) to the cluster via the TF resource?

morgante commented 3 years ago

@walkernobrien The native resource actually works differently (without the Connect Agent).

If you're getting an error, I suggest opening a support case with Google.

bharathkkb commented 3 years ago

/cc @r4hulgupta who was also investigating the native resource