hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.31k stars 1.73k forks source link

google_container_cluster: node_pool_defaults.node_config_defaults.insecure_kubelet_readonly_port_enabled not working #19520

Open MaulinS-Pangea opened 1 week ago

MaulinS-Pangea commented 1 week ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.8.5 on darwin_arm64

Affected Resource(s)

Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/disable-kubelet-readonly-port#check-port-standard

As per GCP docs, to disable the kubelet read-only port at cluster level, the hierarchy is nodePoolDefaults.nodeConfigDefaults.nodeKubeletConfig. The terraform equivalent of this would be node_pool_defaults.node_config_defaults.insecure_kubelet_readonly_port_enabled in a google_container_cluster resource. The apply completes succesfully, but then if I run

gcloud container clusters describe cluster \
    --location=region \
    --flatten=nodePoolDefaults.nodeConfigDefaults 

I get

loggingConfig:
  variantConfig:
    variant: DEFAULT

The expected output should contain insecureKubeletReadonlyPortEnabled: false if the apply was successful and the port was disabled.

I believe the setting is applicable for new clusters only. When I do this at google_container_node_pool level, it works. Either way, it would be nice to have a bit more clear documentation.

gcloud container node-pools describe node-pool-name \
    --cluster=cluster \
    --location=region \
    --flatten=config \
    --format="value(kubeletConfig)"
cpuCfsQuota=False;cpuManagerPolicy=none;insecureKubeletReadonlyPortEnabled=False

Terraform Configuration

resource "google_container_cluster" "cluster" {
  ...
  node_pool_defaults {
    node_config_defaults {
      insecure_kubelet_readonly_port_enabled = "FALSE"
    }
  }
}

Debug Output

No response

Expected Behavior

No response

Actual Behavior

No response

Steps to reproduce

  1. terraform apply

Important Factoids

No response

References

No response

b/369904303

rd-michel commented 1 week ago

hi @MaulinS-Pangea, your terraform configuration only secured the "default pool" kubelet port. in order to disable the insecure kubelet port for all node pools you have to add the following snippet to every node config in every node pool

    kubelet_config {
      cpu_manager_policy                     = "none"
      insecure_kubelet_readonly_port_enabled = "FALSE"
    }

cpu_manager_policy is a required field ... ive set it to default

ggtisc commented 4 days ago

Hi @MaulinS-Pangea As @rd-michel mentions you should try to use the suggested configuration. If after this you continue to have issues share us you complete google_container_cluster code to see what is happening.

You may check the terraform registry documentation here.

megamih commented 3 days ago

Hi @ggtisc @rd-michel I have the same issue with

megamih commented 2 days ago

OK, looks like I narrowed down the issue :) to the wrong handling of default values for insecure_kubelet_readonly_port_enabled.

Existing cluster state:

gcloud container clusters describe cluster \
    --project=project \
    --location=location \
    --flatten=nodePoolDefaults

---
nodeConfigDefaults:
  gcfsConfig:
    enabled: true

then I try to disable insecure kubelet readonly port with the following Terraform Configuration

resource "google_container_cluster" "cluster" {
  ...
  node_pool_defaults {
    node_config_defaults {
      insecure_kubelet_readonly_port_enabled = "FALSE"
    }
  }
}

and terraform shows no changes

terraform apply

No changes. Your infrastructure matches the configuration.

But if I explicitly enable insecure kubelet readonly port first

resource "google_container_cluster" "cluster" {
  ...
  node_pool_defaults {
    node_config_defaults {
      insecure_kubelet_readonly_port_enabled = "TRUE"
    }
  }
}

terraform detects the change

  ~ resource "google_container_cluster" "cluster" {
...
      ~ node_pool_defaults {
          ~ node_config_defaults {
              ~ insecure_kubelet_readonly_port_enabled = "FALSE" -> "TRUE"
                # (1 unchanged attribute hidden)

                # (1 unchanged block hidden)
            }
        }

and adds nodePoolDefaults.nodeConfigDefaults.nodeKubeletConfig block after apply

gcloud container clusters describe cluster \
    --project=project \
    --location=location \
    --flatten=nodePoolDefaults

---
gcfsConfig:
  enabled: true
nodeKubeletConfig:
  insecureKubeletReadonlyPortEnabled: true

And now I am able to actually disable insecure kubelet readonly port for new (not existing) node pools with

resource "google_container_cluster" "cluster" {
  ...
  node_pool_defaults {
    node_config_defaults {
      insecure_kubelet_readonly_port_enabled = "FALSE"
    }
  }
}

terraform detects the change

  ~ resource "google_container_cluster" "cluster" {
...
      ~ node_pool_defaults {
          ~ node_config_defaults {
              ~ insecure_kubelet_readonly_port_enabled = "TRUE" -> "FALSE"
                # (1 unchanged attribute hidden)

                # (1 unchanged block hidden)
            }
        }

and updates nodePoolDefaults.nodeConfigDefaults.nodeKubeletConfig block after apply

gcloud container clusters describe cluster \
    --project=project \
    --location=location \
    --flatten=nodePoolDefaults

---
gcfsConfig:
  enabled: true
nodeKubeletConfig:
  insecureKubeletReadonlyPortEnabled: false
megamih commented 2 days ago

And just in case, I confirmed my understanding of node_pool_defaults.node_config_defaults block in google_container_cluster resource - it defines default settings for the new node pool for the entire cluster (can be overridden if explicitly specified at google_container_node_pool level), and not settings for the default node pool.

# I disabled readOnlyPort at cluster level via terraform with node_pool_defaults.node_config_defaults.insecure_kubelet_readonly_port_enabled: false
gcloud container clusters describe cluster \
    --project=project \
    --location=location \
    --flatten=nodePoolDefaults
---
nodeConfigDefaults:
  gcfsConfig:
    enabled: true
  nodeKubeletConfig:
    insecureKubeletReadonlyPortEnabled: false

# this did not affect existing node pool
gcloud container node-pools describe existing-pool \
    --project=project \
    --cluster=cluster \
    --location=location \
    --flatten=config \
    --format="value(kubeletConfig)"

nothing

# kubelet-config was not changed on the existing nodes too
gcloud compute instances describe existing-pool-gke-vm --zone zone  --project project | grep readOnlyPort
      readOnlyPort: 10255

# I forced existing nodes to restart via node pool upgrade
gcloud container clusters upgrade cluster \
  --node-pool=existing-pool \
  --project=project \
  --location=location

All nodes in node pool [existing-pool] of cluster [cluster] will be upgraded from version [1.29.6-gke.1326000] to version [1.29.8-gke.1031000].

# readOnlyPort config is still missing in the node pool config after upgrade
gcloud container node-pools describe existing-pool \
    --project=project \
    --cluster=cluster \
    --location=location \
    --flatten=config \
    --format="value(kubeletConfig)"

nothing

# kubelet-config was not changed on the nodes too
gcloud compute instances describe existing-pool-gke-vm --zone zone  --project project | grep readOnlyPort
      readOnlyPort: 10255

# I added the new test pool without explicitly disabling readOnlyPort
gcloud container node-pools create kubelet-test \
    --project=project \
    --cluster=cluster \
    --location=location \
    --num-nodes=1 \
    --service-account 'cluster-gke-nodes@project.iam.gserviceaccount.com'

# and newly added node pool inherited cluster node_pool_defaults.node_config_defaults settings as expected
gcloud container node-pools describe kubelet-test \
    --project=project \
    --cluster=cluster \
    --location=location \
    --flatten=config \
    --format="value(kubeletConfig)"

insecureKubeletReadonlyPortEnabled=False

# readOnlyPort is disabled on the new node
gcloud compute instances describe kubelet-test-pool-gke-vm --project=project --zone=zone | grep readOnlyPort
      readOnlyPort: 0

# I enabled readOnlyPort via terraform with node_pool_defaults.node_config_defaults.insecure_kubelet_readonly_port_enabled: true
gcloud container clusters describe cluster \
    --project=project \
    --location=location \
    --flatten=nodePoolDefaults
---
nodeConfigDefaults:
  gcfsConfig:
    enabled: true
  nodeKubeletConfig:
    insecureKubeletReadonlyPortEnabled: true

# this did not affect existing node pool so I deleted it
gcloud container node-pools delete kubelet-test \
    --project=project \
    --cluster=cluster \
    --location=location

# and added it again
gcloud container node-pools create kubelet-test \
    --project=project \
    --cluster=cluster \
    --location=location \
    --num-nodes=1 \
    --service-account 'cluster-gke-nodes@project.iam.gserviceaccount.com'

# and newly added node pool inherited cluster node_pool_defaults.node_config_defaults settings as expected again
gcloud container node-pools describe kubelet-test \
    --project=project \
    --cluster=cluster \
    --location=location \
    --flatten=config \
    --format="value(kubeletConfig)"
insecureKubeletReadonlyPortEnabled=True

# readOnlyPort is enabled on the new node
gcloud compute instances describe new-kubelet-test-pool-gke-vm --project=project --zone=location | grep readOnlyPort
      readOnlyPort: 10255
ggtisc commented 2 days ago

Documentation inconsistency

It is not clear at all and causes confusion between users which is the correct way to set insecure_kubelet_readonly_port_enabled = "FALSE"

wyardley commented 29 minutes ago

OK, looks like I narrowed down the issue :) to the wrong handling of default values for insecure_kubelet_readonly_port_enabled.

Curious whether: https://github.com/GoogleCloudPlatform/magic-modules/pull/11688 (which came out in 6.4.0) will make a difference, though it may not (since it only should affect the behavior at resource creation).

It's difficult to do perfectly because the API doesn't send a value if the value is suppressed. Either way, you may want to see if updating to 6.4.0 changes or fixes the behavior for you.

  • it defines default settings for the new node pool for the entire cluster (can be overridden if explicitly specified at google_container_node_pool level

Yes, in my understanding, it affects the default behavior of newly created nodepools if they don't have the setting set or not. There's also the nested node_config.node_kubelet_config in the google_container_cluster resource, which ideally shouldn't be used, but which affects the default nodepool that's created if remove_default_node_pool is not set. This parameter is also valid (for autopilot clusters anyway) in node_pool_auto_config.node_kubelet_config.

As @ggtisc says, the docs could probably be a little clearer in various spots. But it is also possible that there's an actual corner case somewhere with the behavior of node_pool_defaults.node_config_defaults.

Reading the top level Google docs on the various places this value can be set is probably a good way to double-check. For the most part, the attribute naming in the provider tracks with the Google APIs.

The good thing is that the default will be changing soon for newly created clusters, and at that point, hopefully people won't need to set this setting.

wyardley commented 27 minutes ago

@rd-michel:

cpu_manager_policy is a required field ... ive set it to default

Side note: it's not now as of 6.4.0 (https://github.com/hashicorp/terraform-provider-google/pull/19464) Note that in earlier versions of the provider, it's possible to set it to "" (IIRC) to get the default behavior, which doesn't totally line up with the docs.