Closed wyardley closed 1 month ago
/gcbrun
Hmm, looks like we are getting lower-case in module.example.module.gke.google_container_node_pool.pools["pool-03"]
:
Step #70 - "converge node-pool-local": + kubelet_config {
Step #70 - "converge node-pool-local": + cpu_cfs_quota = true
Step #70 - "converge node-pool-local": + cpu_manager_policy = "static"
Step #70 - "converge node-pool-local": + insecure_kubelet_readonly_port_enabled = "false"
Step #70 - "converge node-pool-local": + pod_pids_limit = 4096
Step #70 - "converge node-pool-local": }
tep #70 - "converge node-pool-local": Error: expected node_config.0.kubelet_config.0.insecure_kubelet_readonly_port_enabled to be one of ["FALSE" "TRUE"], got false
Step #70 - "converge node-pool-local":
Step #70 - "converge node-pool-local": with module.example.module.gke.google_container_node_pool.pools["pool-03"],
Step #70 - "converge node-pool-local": on ../../../modules/beta-public-cluster/cluster.tf line 656, in resource "google_container_node_pool" "pools":
Step #70 - "converge node-pool-local": 656: node_config {
Hmm, looks like we are getting lower-case in
I'll take a look. I thought I checked for any spots I'd missed, but apparently not.
Thanks @wyardley!
Probably needs a make build
Probably needs a make build
Got to take a look. Doesn't look like it will work exactly as-is -- will make the adjustment that I think it needs and will build and push the results.
edit: See if what's there now looks good to you
/gcbrun
Probably needs a make build
Got to take a look. Doesn't look like it will work exactly as-is -- will make the adjustment that I think it needs and will build and push the results.
edit: See if what's there now looks good to you
Thanks @wyardley - Yes, this looks good, and less complicated that what I suggested.
I was expecting we might need to update the test asset, however this is unexpected.
The node_pool_defaults plan now looks good:
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: + node_pool_defaults {
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: + node_config_defaults {
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: + insecure_kubelet_readonly_port_enabled = "FALSE"
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: + logging_variant = (known after apply)
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: + gcfs_config (known after apply)
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: }
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: }
However the default-node-pool is showing true
?
private_zonal_with_networking_test.go:81:
Error Trace: /workspace/test/integration/private_zonal_with_networking/private_zonal_with_networking_test.go:81
/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.16.2/pkg/tft/terraform.go:638
/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.16.2/pkg/tft/terraform.go:670
/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.16.2/pkg/utils/stages.go:31
/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.16.2/pkg/tft/terraform.go:670
Error: Not equal:
expected: map[string]interface {}{"diskSizeGb":100, "diskType":"pd-standard", "effectiveCgroupMode":"EFFECTIVE_CGROUP_MODE_V2", "imageType":"COS_CONTAINERD", "labels":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "node_pool":"default-node-pool"}, "loggingConfig":map[string]interface {}{"variantConfig":map[string]interface {}{"variant":"DEFAULT"}}, "machineType":"e2-medium", "metadata":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "disable-legacy-endpoints":"true", "node_pool":"default-node-pool"}, "oauthScopes":[]interface {}{"https://www.googleapis.com/auth/cloud-platform"}, "serviceAccount":"SERVICE_ACCOUNT", "shieldedInstanceConfig":map[string]interface {}{"enableIntegrityMonitoring":true}, "tags":[]interface {}{"gke-CLUSTER_NAME", "gke-CLUSTER_NAME-default-node-pool"}, "windowsNodeConfig":map[string]interface {}{}, "workloadMetadataConfig":map[string]interface {}{"mode":"GKE_METADATA"}}
actual : map[string]interface {}{"diskSizeGb":100, "diskType":"pd-standard", "effectiveCgroupMode":"EFFECTIVE_CGROUP_MODE_V2", "imageType":"COS_CONTAINERD", "kubeletConfig":map[string]interface {}{"insecureKubeletReadonlyPortEnabled":true}, "labels":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "node_pool":"default-node-pool"}, "loggingConfig":map[string]interface {}{"variantConfig":map[string]interface {}{"variant":"DEFAULT"}}, "machineType":"e2-medium", "metadata":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "disable-legacy-endpoints":"true", "node_pool":"default-node-pool"}, "oauthScopes":[]interface {}{"https://www.googleapis.com/auth/cloud-platform"}, "serviceAccount":"SERVICE_ACCOUNT", "shieldedInstanceConfig":map[string]interface {}{"enableIntegrityMonitoring":true}, "tags":[]interface {}{"gke-CLUSTER_NAME", "gke-CLUSTER_NAME-default-node-pool"}, "windowsNodeConfig":map[string]interface {}{}, "workloadMetadataConfig":map[string]interface {}{"mode":"GKE_METADATA"}}
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-(map[string]interface {}) (len=14) {
+(map[string]interface {}) (len=15) {
(string) (len=10) "diskSizeGb": (float64) 100,
@@ -5,2 +5,5 @@
(string) (len=9) "imageType": (string) (len=14) "COS_CONTAINERD",
+ (string) (len=13) "kubeletConfig": (map[string]interface {}) (len=1) {
+ (string) (len=34) "insecureKubeletReadonlyPortEnabled": (bool) true
+ },
(string) (len=6) "labels": (map[string]interface {}) (len=2) {
Test: TestPrivateZonalWithNetworking
It’s because at API level, it’s a bool, and this is the API response vs the terraform state
so I think this is expected, if unintuitive. I can update that value shortly.
It’s because at API level, it’s a bool, and this is the API response vs the terraform state
so I think this expected, if unintuitive. I can update that value shortly.
Hey @wyardley! I'm not thinking the string vs bool types, but we set the cluster default to false, and the default node pool is showing true? Perhaps there is something special about the default node pool.
Let me look. If we only set node_pool_defaults
we’ll probably need to set it in node_config
also for the default pool I think
Let me look. If we only set
node_pool_defaults
we’ll probably need to set it innode_config
also for the default pool I think
Yeah, that is my guess as well.
So a couple questions:
1) For the default pool, should we allow setting this explicitly in var.node_pools[0]
as well as taking the default cluster setting if not?
2) In the case where remove_default_node_pool
is set to false
, do we currently have google_container_cluster.node_config.kubelet_config
defined in the config at all? For sure we'll need to add it here, I think, but if we need to support the case where we have remove_default_node_pool
set to false
, we may also need to be able to define it in the top level node_config
? I may be missing something, but couldn't find that the module is using anything in that section now?
Also, should I try to squeeze this in the same PR?
Thanks for digging into this @wyardley!
So a couple questions:
- For the default pool, should we allow setting this explicitly in
var.node_pools[0]
as well as taking the default cluster setting if not?
I think that would be the ideal situation for the default node pool - do an explicit lookup, failing that use the cluster default, failing that then set null
- In the case where
remove_default_node_pool
is set tofalse
, do we currently havegoogle_container_cluster.node_config.kubelet_config
defined in the config at all? For sure we'll need to add it here, I think, but if we need to support the case where we haveremove_default_node_pool
set tofalse
, we may also need to be able to define it in the top levelnode_config
? I may be missing something, but couldn't find that the module is using anything in that section now?
Yeah, the best practice recommendation is to remove the default node pool, but it's not required. Due it's typical removal, I suspect it's been an accidental oversight theses weren't added, and so should be updated to match.
Also, should I try to squeeze this in the same PR?
I'm comfortable if everything is in a single PR, however I can certainly open a seperate PR to add kubelet_config to the default node pool to keep the PRs more focused. Let me know your preference.
I pushed up an update of what I have so far (only updated one of the tests).
I'm going to do what I probably should have done sooner, and actually run this locally, so I can try a few things a bit faster.
/gcbrun
Fixing one minor issue with that, and will push again.
however I can certainly open a seperate PR to add kubelet_config to the default node pool to keep the PRs more focused
Yes, I think that would be a good idea, especially since it may need to include some additional items, and you probably have a better idea of where exactly it should go.
/gcbrun
The argument "cpu_manager_policy" is required, but no definition was found.
~@apeabody I can add a dumb fix for this in the short term, but this can probably be removed when support is dropped for TPG 5.x.~
Thanks @wyardley - I have https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/2147 to add the kubelet_config
to the default-node-pool. That should resolve the missing cpu_manager_policy
.
I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.
Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?
I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.
Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?
Yes, that is the plan, hopefully by tomorrow get it merged in.
I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.
Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?
Yes, that is the plan, hopefully by tomorrow get it merged in.
Hi @wyardley - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/2147 is merged, can you rebase this as needed? Thanks!
Hi @wyardley - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/2147 is merged, can you rebase this as needed? Thanks!
Yup, updated now.
Let me know if the logic looks correct, and if the adjusted title makes sense (I think you've already added it to node_config
in #2147?
Guessing we might still need to adjust one or more of the tests?
/gcbrun
FYI: Looks like the Windows node pool test has recently started timing out (tests are failing around 59-61 minutes), may need to adjust that test.
Add
insecureKubeletReadonlyPortEnabled
tonode_config.kubelet_config
for the default node-pool and for additional pools. It may also be necessary to define the top levelnode_config
more broadly for the case whereremove_default_node_pool
is set to false, which should probably be handled separately.Also, the upstream provider (intentionally) uses an enum of
"TRUE"
/"FALSE"
vs. a boolean. Update the code to follow this, and add a test case (also suggested by @apeabody) that covers the cluster level setting vs node pool one.Fixes #2013