hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

Use of private_endpoint_subnetwork in private GKE stops deployment with error #20429

Open juangascon opened 21 hours ago

juangascon commented 21 hours ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.9.8 on linux_amd64

This happens also in both 5.44.2 and 6.1.0 versions of the provider. The issue happens probably since the v5.18 when the attribute private_endpoint_subnetwork becomes Optional and not Read-Only. I do not know if this has a link with issue #15422

Affected Resource(s)

This happens in the google_container_cluster resource when configured as private and we configure the IP range of the control-plane subnetwork using the attribute private_endpoint_subnetwork instead of master_ipv4_cidr_block.

We have two ways of giving the CIDR range for the control plane endpoint:

  1. master_ipv4_cidr_block
  2. private_endpoint_subnetwork

Reading the documentation from GCP "Create a cluster and select the control plane IP address range" it is said:

So, if your organization's security constraints forces you to activate the VPC Logs in all subnetworks, you, and not GCP, have to create the subnet in order to toggle on the feature with Terraform. Terraform can't modify (it is REALLY complicated) the parameters of a resource created out of its scope. Though, if I create a subnet and I put its name as value for the private_endpoint_subnetwork attribute, I get the following error:

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.gke.google_container_cluster.prototype to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/google"
│ produced an invalid new value for .private_cluster_config[0].enable_private_endpoint: was null, but now cty.False.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Gemini 1.5 Pro explains the error: The error message indicates that the google_container_cluster resource's private_cluster_config.enable_private_endpoint attribute is unexpectedly changing from null to false during the apply phase, even though it's not explicitly defined in your configuration.

Claude Sonnet 3.5 details even more: The problem seems to be in how the provider handles the private_cluster_config state during plan and apply phases, specifically around the PSC (Private Service Connect) clusters.

While the bug is corrected in the provider, a quick bypass working solution proposed by Gemini 1.5 Pro is to explicitly set enable_private_endpoint to null in your configuration:

resource "google_container_cluster" "prototype" {
  # ... other configurations ...
  private_cluster_config {
    enable_private_nodes        = true
    enable_private_endpoint     = null # Explicitly set to null
    private_endpoint_subnetwork = google_compute_subnetwork.cluster_control_plane.name
    master_global_access_config {
      enabled = false
    }
  }
  # ... rest of your configuration ...
}

Terraform Configuration

resource "google_compute_subnetwork" "cluster_control_plane" {
  name                     = local.control_plane_private_endpoint_subnet_name
  region                   = var.region
  network                  = google_compute_network.prototype.name
  private_ip_google_access = true
  ip_cidr_range            = var.private_control_plane_subnetwork_ip_cidr_range

  stack_type                 = "IPV4_IPV6"
  private_ipv6_google_access = "ENABLE_OUTBOUND_VM_ACCESS_TO_GOOGLE"
  ipv6_access_type           = "INTERNAL"

  log_config {
    aggregation_interval = "INTERVAL_10_MIN"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

resource "google_container_cluster" "prototype" {
        // ... other GKE configurations ...
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    private_endpoint_subnetwork = google_compute_subnetwork.cluster_control_plane.name
    master_global_access_config {
      enabled = false
    }
  }
        // ... other GKE configurations ...
}

Debug Output

No response

Expected Behavior

The GKE deployment should end correctly with the custom user-created subnetwork associated to the private control plane.

Actual Behavior

The deployment stops with the following error:

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.gke.google_container_cluster.prototype to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/google"
│ produced an invalid new value for .private_cluster_config[0].enable_private_endpoint: was null, but now cty.False.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Steps to reproduce

  1. configure the google_container_cluster as a private cluster using the private_cluster_config block. There, we declare the subnet for the private control plane using the attribute private_endpoint_subnetwork instead of declaring a CIDR IP range with the attribute master_ipv4_cidr_block.
  2. terraform apply

Important Factoids

It seems that the file where the bug exists is : resource_container_cluster.go

I am not a Golang coder so, I do not know if the solution proposed by Claude Sonnet 3.5 via GitHub Copilot is good, neither I can submit a pull request.

Claude Sonnet 3.5 says:

Here's what needs to be fixed:
First, in the containerClusterPrivateClusterConfigSuppress function, we need to modify how we handle the suppression for PSC clusters.
Second, we need to ensure the enable_private_endpoint field is properly handled when private_endpoint_subnetwork is specified.
Here's the fix: 

// Modify the containerClusterPrivateClusterConfigSuppress function
func containerClusterPrivateClusterConfigSuppress(k, old, new string, d *schema.ResourceData) bool {
    o, n := d.GetChange("private_cluster_config.0.enable_private_endpoint")
    suppressEndpoint := !o.(bool) && !n.(bool)

    o, n = d.GetChange("private_cluster_config.0.enable_private_nodes")
    suppressNodes := !o.(bool) && !n.(bool)

    // Do not suppress diffs when private_endpoint_subnetwork is configured
    _, hasSubnet := d.GetOk("private_cluster_config.0.private_endpoint_subnetwork")

    // Do not suppress diffs when master_global_access_config is configured
    _, hasGlobalAccessConfig := d.GetOk("private_cluster_config.0.master_global_access_config")

    // Do not suppress diffs when master_ipv4_cidr_block is configured
    _, hasMasterCidr := d.GetOk("private_cluster_config.0.master_ipv4_cidr_block")

    if k == "private_cluster_config.0.enable_private_endpoint" {
        // Never suppress enable_private_endpoint when private_endpoint_subnetwork is set
        if hasSubnet {
            return false
        }
        return suppressEndpoint && !hasMasterCidr
    } else if k == "private_cluster_config.0.enable_private_nodes" {
        return suppressNodes && !hasSubnet && !hasMasterCidr
    } else if k == "private_cluster_config.#" {
        return suppressEndpoint && suppressNodes && !hasSubnet && !hasGlobalAccessConfig && !hasMasterCidr
    } else if k == "private_cluster_config.0.private_endpoint_subnetwork" {
        return tpgresource.CompareSelfLinkOrResourceName(k, old, new, d)
    }
    return false
}

And modify how we handle the private_cluster_config in the expansion:

func expandPrivateClusterConfigMasterIpv4CidrBlock(configured interface{}, c *container.Cluster) *container.PrivateClusterConfig {
    v := configured.(string)

    config := &container.PrivateClusterConfig{
        EnablePrivateNodes: c.NetworkConfig.DefaultEnablePrivateNodes,
        ForceSendFields:   []string{"EnablePrivateNodes"},
    }

    // If master_ipv4_cidr_block is set
    if v != "" {
        config.MasterIpv4CidrBlock = v
        config.ForceSendFields = append(config.ForceSendFields, "MasterIpv4CidrBlock")
    }

    // If private_endpoint_subnetwork is set, ensure enable_private_endpoint is explicitly set
    if subnetwork, ok := d.GetOk("private_cluster_config.0.private_endpoint_subnetwork"); ok {
        config.EnablePrivateEndpoint = false
        config.ForceSendFields = append(config.ForceSendFields, "EnablePrivateEndpoint")
    }

    return config
}

These changes should:
Prevent suppression of diffs when private_endpoint_subnetwork is specified
Ensure enable_private_endpoint is explicitly set when using PSC clusters
Better handle the state transitions between the plan and apply phases
The error you're seeing occurs because the provider is trying to be smart about when to send the enable_private_endpoint field to the API, but this causes inconsistencies in the state. The fix ensures that when using private_endpoint_subnetwork, we're explicit about the enable_private_endpoint value.
After applying these changes, you should be able to use either master_ipv4_cidr_block or private_endpoint_subnetwork without encountering the inconsistent plan error.
Remember that you can't use both master_ipv4_cidr_block and private_endpoint_subnetwork at the same time - they're mutually exclusive as they represent different types of private cluster configurations (traditional private GKE vs PSC-based private GKE).

References

No response