Closed goobysnack closed 9 months ago
@goobysnack could you share a minimal configuration (no modules or variables) that reproduces this failure?
@goobysnack could you share a minimal configuration (no modules or variables) that reproduces this failure?
This should be sanitized
resource "google_container_cluster" "cluster" {
name = clustername
project = myprojectid
location = myregion
network = default
subnetwork = default
initial_node_count = 1
min_master_version = 1.26.6-gke.1700
master_auth {
client_certificate_config {
issue_client_certificate = false
}
}
release_channel {
channel = stable
}
enable_legacy_abac = true
remove_default_node_pool = true
addons_config {
gcp_filestore_csi_driver_config {
enabled = true
}
}
ip_allocation_policy {
// Choose the range, but let GCP pick the IPs within the range
cluster_ipv4_cidr_block = "/14"
services_ipv4_cidr_block = "/20"
}
private_cluster_config {
enable_private_endpoint = false
enable_private_nodes = true
master_ipv4_cidr_block = join("",["172.16.",local.octet,".32/28"])
}
workload_identity_config {
workload_pool = "myprojectid.svc.id.goog"
}
gateway_api_config {
channel = "CHANNEL_STANDARD"
}
fleet {
project = myprojectid
}
// GKE clusters are critical objects and should not be destroyed.
lifecycle {
prevent_destroy = true
ignore_changes = [node_locations,min_master_version,resource_labels]
}
}
/******I've tried both commented and uncommented and when uncommented this is in conflict with the fleet block in the cluster. When commented, I get the errors described below for the membership_id and membership[0].id******/
resource "google_gke_hub_membership" "membership" {
project = myprojectid
membership_id = google_container_cluster.cluster.name
endpoint {
gke_cluster {
resource_link = google_container_cluster.cluster.id
}
}
authority {
issuer = "https://container.googleapis.com/v1/${google_container_cluster.cluster.id}"
}
}
resource "google_gke_hub_feature" "multiclusteringress" {
name = "multiclusteringress"
project = myprojectid
location = "global"
spec {
multiclusteringress {
config_membership = join("",["projects/",myprojectid,"/locations/",myregion,"/memberships/",google_container_cluster.cluster.name]) //google_gke_hub_membership.membership[0].id
}
}
}
resource "google_gke_hub_feature_membership" "feature_member" {
project = myprojectid
location = "global"
feature = "configmanagement"
membership = join("",["projects/myprojectid/locations/myregion/memberships/",google_container_cluster.cluster.name]) //google_gke_hub_membership.membership[0].id
configmanagement {
config_sync {
source_format = "unstructured"
git {
sync_repo = "https://source.developers.google.com/p/myprojectid/r/acm/fleet-config"
sync_branch = load
secret_type = "gcpserviceaccount"
gcp_service_account_email = "svc-repo@${myprojectid}.iam.gserviceaccount.com"
sync_rev = "HEAD"
sync_wait_secs = "60"
}
}
policy_controller {
enabled = false
exemptable_namespaces = []
log_denies_enabled = false
mutation_enabled = false
referential_rules_enabled = false
template_library_installed = false
monitoring {
backends = [
"PROMETHEUS",
"CLOUD_MONITORING",
]
}
}
}
}
resource "google_gke_hub_feature_membership" "mesh" {
project = myprojectid
location = "global"
feature = "servicemesh"
membership = google_container_cluster.cluster.name //google_gke_hub_membership.membership[0].membership_id
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
}
@goobysnack is there any way to get more minimal & make sure it's runnable? For example, by removing fields that aren't required to reproduce the error. I've tried to reproduce using this example and ended up needing to tweak it significantly to add quotes in various places. You can replace names with something like "tf-test-1234567" (where the last part is a random string of characters) or similar to make something that will work while still being safe to share.
I was able to create the cluster on its own with no problems, but when I tried to add in the gke hub resources I eventually ended up getting blocked by the error Error 404: Resource 'projects/analog-ace-309318/locations/global/features/servicemesh'
from the API, so it seems like something is missing.
@goobysnack is there any way to get more minimal & make sure it's runnable? For example, by removing fields that aren't required to reproduce the error. I've tried to reproduce using this example and ended up needing to tweak it significantly to add quotes in various places. You can replace names with something like "tf-test-1234567" (where the last part is a random string of characters) or similar to make something that will work while still being safe to share.
I was able to create the cluster on its own with no problems, but when I tried to add in the gke hub resources I eventually ended up getting blocked by the error
Error 404: Resource 'projects/analog-ace-309318/locations/global/features/servicemesh'
from the API, so it seems like something is missing.
exactly my point. I don't know the correct resource paths for membership
for example.
Error: Error creating FeatureMembership: googleapi: Error 400: InvalidValueError for field membership_specs["projects/<project_number>/locations/global/memberships/<cluster_name>"].feature_spec: does not match a current membership in this project. Keys should be in the form: projects/<project_number/locations/{l}/memberships/{m}
with module.gcp_gkeclusters["cluster-1"].google_gke_hub_feature_membership.feature_member[0],
on .terraform/modules/gcp_gkeclusters/main.tf line 157, in resource "google_gke_hub_feature_membership" "feature_member":
157: resource "google_gke_hub_feature_membership" "feature_member" {
Error: Error creating FeatureMembership: googleapi: Error 400: InvalidValueError for field membership_specs["projects/<project_number>/locations/global/memberships/<cluster_name>"].feature_spec: does not match a current membership in this project. Keys should be in the form: projects/<project_number>/locations/{l}/memberships/{m}
with module.gcp_gkeclusters["cluster-1"].google_gke_hub_feature_membership.mesh[0],
on .terraform/modules/gcp_gkeclusters/main.tf line 193, in resource "google_gke_hub_feature_membership" "mesh":
193: resource "google_gke_hub_feature_membership" "mesh" {
gotcha - I dug into this some more and it looks like the there are a couple issues with your configuration.
First, you need to make sure you have a google_gke_hub_feature
block for every membership (currently missing for several of them) - see https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/gke_hub_feature_membership#example-usage---service-mesh for an example of how to do this.
Second, it seems like there's a problem with how google_container_cluster.fleet plays with gke_hub resources - I don't know whether it's on the container_cluster side or the gke_hub side. google_container_cluster.fleet
sets up an implicit fleet membership which you can access as fleet.0.membership
. For example:
resource "google_container_cluster" "cluster" {
// The rest of the configuration
fleet {
project = "my-project-id"
}
}
resource "google_gke_hub_feature" "feature" {
name = "servicemesh"
location = "global"
}
resource "google_gke_hub_feature_membership" "mesh" {
project = "my-project-id"
location = "global"
feature = google_gke_hub_feature.feature.name
membership = google_container_cluster.cluster.fleet.0.membership
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
}
But when I try to do that I end up getting the following error:
InvalidValueError for field membership_specs[\"projects/1234567890/locations/global/memberships/tf-test-qwoieeruoqiwu\"].feature_spec: does not match a current membership in this project. Keys should be in the form: projects/1234567890/locations/{l}/memberships/{m}
I am able to configure things correctly if I only include the gke_hub resources and don't set fleet
, in which case I end up with the following in state for google_container_cluster:
{
"membership": "//gkehub.googleapis.com/projects/1234567890/locations/global/memberships/tf-test-qwoieeruoqiwu",
"pre_registered": true,
"project": "my-project-id"
}
and the following for google_gke_hub_feature_membership.membership_id
: tf-test-qwoieeruoqiwu
.
but that doesn't match what was being sent to the API.
In conclusion I think that the best thing to do is to not set the fleet field, and create the membership explicitly instead (which looks like it still sets up the fleet field). I'll forward this to the service team to see if there are any documentation improvements we can make here to clarify how this is all supposed to play together.
The issue with "do not set the fleet field" is that then the cluster doesn't fully register correctly. I think you should restore the bug label. JMO.
@goobysnack could you elaborate on what "doesn't fully register correctly" means? How can you tell whether it's fully registered correctly?
tentatively restoring the bug
label in the meantime.
@goobysnack could you elaborate on what "doesn't fully register correctly" means? How can you tell whether it's fully registered correctly?
Fair question - it's how we discovered this issue. In the GUI, the cluster doesn't show as registered. Under "Fleet" it shows the word (with link) "Register". I started looking at Google docs on how to register a fleet to see what I was missing, and found the registration moved from the hub resource to the cluster resource. When I added it to the cluster resource, this all happened. As a temp work around, I removed the fleet from the cluster resource, AND added a lifecycle rule for ignore changes for fleet...and clicked the "button". The Fleet API completed the registration. So, we have a functional work around, but it's not really automation if you have to clicky click. :D
FYI per google support in the ticket I created:
The product team responded stating that It looks like the Membership is stuck in a bad state in the database (not sure why but it could be due to a TF conflict, for example enabling GKE Enterprise vs google_gke_hub_membership module).
BTW, I think you hit the actual issue. It's not the fleet registration. That works. It's everything else that needs to reference the membership. The membership.id or membership_id doesn't match what's expected now that this isn't done via the hubAPI.
Another datapoint. It looks like when the cluster is registered in the fleet block in the cluster config, it's being registered in the region, not global. when we click on the register in the UI, or register it via the hub resource, it gets registered globally. And K8s/Fleet doesn't see them as the same:
gcloud container fleet memberships list --project myproject
NAME EXTERNAL_ID LOCATION
use1-cluster-1 us-east1
use1-cluster-1 <numerical ID removed> global
This is what's in the state file for the cluster:
fleet {
membership = "//gkehub.googleapis.com/projects/myproject/locations/us-east1/memberships/use1-cluster-1"
pre_registered = false
project = "myproject"
}
I think that's the crux of the issue. Fleet memberships are supposed to global. The fleet itself is a global construct (even if the primary cluster is regional). So, all the Feature Memberships expect to find the registration in the global path of the api, not the regional.
@goobysnack Just created and registered a cluster to a fleet using the gcloud documentation. The membership that they create automatically is regional. It's strange, but it is working fine too. I haven't been able to make TF work with ASM, clusters always break and take hours to recover.
Reading the documentation, I think I know what is going on. The fleet block is equivalent to the following flags under the hood:
https://cloud.google.com/sdk/gcloud/reference/container/clusters/create#--fleet-project And https://cloud.google.com/sdk/gcloud/reference/container/clusters/create#--enable-fleet
When --enable-fleet is used, the membership is created automatically. This is called "registration during creation" and it is used to automatically register the cluster to the fleet during creation. It is also used to inherit the default configurations of the fleet in case the cluster is GKE Enterprise. So with this fleet block, you're telling GCP to create a fleet membership automatically for you during the creation of the cluster, and to potentially configure its features to match the Fleet's default if you have GKE Enterprise.
The membership that is created internally has the format "project/project_number/....". You can see that the --fleet-project can be the project_number or the project_id. Internally GCP always creates the membership with the project_number. If you pass the project_id instead, GCP will use the project_id to get the project_number.
But there is a bug in the google_container_cluster.fleet.0.membership. It is NOT returning the "project/project_number" pattern. And my guess is that they are not really querying the fleet membership associated with the cluster but instead just templating "project/${var.project}". By "var.project" I'm talking about the project attribute that is passed to the fleet block. So, if you pass the "project = project_id" to the fleet block, you get the membership path with the project_id instead of the project_number. I'd try doing something like:
fleet {
project = ${local.project_number}
}
This should result in a proper membership automatically generated by the --fleet and --fleet-project flags and fleet.0.membership returning the proper path.
I'll give that a try @AndresPinerosZen - just the project number in the cluster resource AND the project number in the feature member configs. I have a feeling the other problem is that the API path for the membership is regional, but the feature itself is global.
I think this is what's expected and not a Terraform issue. If you create a cluster manually or with gcloud, GCP will create a membership that has the region in the path (locations/
This is resolved. @AndresPinerosZen @melinath
I was able to restructure the terraform to conform to the current methodology. It should be clear that the changes Google made here were disruptive and not at all well socialized. Here's what I had to do:
Moved Service Mesh and Configuration Management specifics out of their per cluster hub_feature_member resource blocks and into a global respective hub_feature resource block taking advantage of the new section "fleet_default_member_config", where each new cluster should inherit the config.
For multiclusteringress, which does have to be assigned to ONE of the clusters to host the CRDS (aka the primary cluster), I had to use the project number in the member resource path as stated earlier in this issue:
config_membership = join("",["projects/",data.google_project.project.number,"/locations/",var.region,"/memberships/",google_container_cluster.cluster.name])
I was able to update all the clusters and even build brand new without issue. All clusters are registered and working as of this writing.
The bug in the google_container_cluster.fleet.0.membership still persists though. It shouldn't return an output with the format "projects/project_id" if the project id is given to the fleet block. It should always output the format "projects/project_number" if the fleet block is given the project_id or the project_number.
Okay, so it turns out that google_container_cluster.fleet
implicitly creates a regional membership with the full path to the membership, but google_gke_hub_feature_membership
expects to have the short membership id and the region to be passed separately via membership
and membership_location
.
resource "google_gke_hub_feature_membership" "mesh" {
project = "my-project-id"
location = "global"
membership_location = "us-central1" <== need to add membership location here, otherwise it points to global by default.
feature = google_gke_hub_feature.feature.name
membership = google_container_cluster.cluster.fleet.0.membership
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
}
It should be possible to extract the values with a regex like:
locals {
membership_re = "//gkehub.googleapis.com/projects/([^/]*)/locations/([^/]*)/memberships/([^/]*)$"
}
resource "google_gke_hub_feature_membership" "my_feature_membership" {
// ...
membership = regex(local.membership_re, google_container_cluster.my_cluster.fleet.0.membership)[2]
membership_location = regex(local.membership_re, google_container_cluster.my_cluster.fleet.0.membership)[1]
}
The regional memberships seem to be working, but I'm not sure how this is interacting with the "Clusters in sync with fleet" in the UI. After a terraform apply, my cluster shows as "Cluster not in sync with fleet". If I click in the UI "sync to fleet settings", the UI then shows my cluster to be in sync with fleet. My gke hub feature (configmanagement) appears to still be functional, but now my terraform wants to reapply the google_gke_hub_feature_membership config_sync stanza. I'm not sure how to get "Cluster in sync with fleet" status to display properly in the UI via terraform.
edit: moving the configmanagement configuration block from the google_gke_hub_feature_membership resource to the google_gke_hub_feature solved this issue for me.
I'll chime in having tackled this recently, for any future wanderers..
First, you need to make sure you have a
google_gke_hub_feature
block for every membership (currently missing for several of them) - see https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/gke_hub_feature_membership#example-usage---service-mesh for an example of how to do this.
I found this pattern of having a google_gke_hub_feature
block for every membership, to be unhelpful. The servicemesh
feature being enabled is kind of a per-project thing afaik, same as: gcloud container fleet mesh enable --project <PROJECT_ID>
. If I did have multiples of this google_gke_hub_feature
it would throw errors for ../servicemesh feature already enabled
, or something..
This is also kind of an anti-pattern to the guidance here, specifically the part about
google_gke_hub_membership
is no longer required - membership can be referenced via cluster.Second, it seems like there's a problem with how google_container_cluster.fleet plays with gke_hub resources
Yea, this was pretty painful, specifically what goes here:
resource "google_gke_hub_feature_membership" "member" {
...
membership_location = ?
membership = ?
...
I tried all the things..
google_container_cluster.test_cluster.fleet.0.membership
regex(local.membership_regex, google_container_cluster.test_cluster.fleet.0.membership)[2]
join("",["projects/",var.project_id,"/locations/",var.region,"/memberships/",google_container_cluster.test_cluster.name])
I finally came up with something just using simple string interpolation, but it works for me..
resource "google_gke_hub_feature_membership" "argo_member" {
project = var.project_id
location = "global"
feature = "servicemesh"
membership = "projects/${var.project_id}/locations/${var.region}/memberships/${local.cluster_name}"
membership_location = google_container_cluster.cluster.location # var.region works here also
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
depends_on = [ google_gke_hub_feature.servicemesh ]
}
A few folks noted moving things out of google_gke_hub_feature_membership
and into google_gke_hub_feature
helps. I think it depends on the feature, I just needed servicemesh
, so found keeping this block here:
resource "google_gke_hub_feature_membership" "mesh" {
...
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
to be most helpful for this feature, I can see moving the specs out might help in some cases..
Finally, my use case was supporting multiple private, autopilot clusters, across multiple projects or envs
, the pattern I found most useful was here and the notes on this current issue.
Ultimately what I found works is..
fleet
flag in each google_container_cluster
definitiongoogle_gke_hub_membership
, as it's no longer required.google_gke_hub_feature
per project, all the google_gke_hub_feature_membership
in that project can reference this shared hub_feature
.google_gke_hub_feature_membership
depends_on = [ google_gke_hub_feature.servicemesh ] or the feature being enabled..google_gke_hub_feature
depends_on = [ some_var.mesh_api ], because the api is needed to enable the feature, etc..Just thought I'd leave a few notes, thanks for all the helpful responses above!
Hey folks - I'm marking this as resolved by https://github.com/GoogleCloudPlatform/magic-modules/pull/9974, which added two output-only fields to google_container_cluster.fleet which extract the membership_id
and membership_location
appropriately from the membership
field and make them more easily available to users.
If you have any issues with these new fields or find they don't resolve your issue, please open a new ticket with details. Thanks!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Breaking change from
I believe this is a breaking change from: https://github.com/hashicorp/terraform-provider-google/issues/14761
Terraform Version
Terraform v1.5.5
Affected Resource(s)
google_container_cluster google_gke_hub_membership google_gke_hub_feature/multiclusteringress google_gke_hub_feature_membership/feature = "configmanagement" google_gke_hub_feature_membership/feature = "servicemesh"
Terraform Configuration Files
Expected Behavior
Before adding the fleet block, fleet registration was not completing correctly. After adding the fleet block and commenting the
google_gke_hub_membership
block, the rest of the membership/feature resource blocks were unable to find the registration.Actual Behavior
See above
Steps to Reproduce
terraform apply
References
https://github.com/hashicorp/terraform-provider-google/issues/14761
b/316686427 b/316687134