Open hoskeri opened 1 year ago
Also, the debug log had the following warnings...
2023-09-07T20:47:04.246Z [WARN] Provider "provider[\"registry.terraform.io/hashicorp/google\"]" produced an unexpected new value for google_container_cluster.primary, but we are tolerating it because it is using the legacy plugin SDK.
The following problems may be the cause of any confusing errors from downstream operations:
- .node_config[0].boot_disk_kms_key: was null, but now cty.StringVal("")
- .node_config[0].node_group: was null, but now cty.StringVal("")
- .node_config[0].kubelet_config[0].cpu_cfs_quota: was null, but now cty.False
- .node_config[0].kubelet_config[0].cpu_cfs_quota_period: was null, but now cty.StringVal("")
/cc
Wasn't immediately obvious to me since cpu_manager_policy
is required if node_config.kubelet_config
is set, and the docs only mention two available options, but updating the value doesn't actually make any changes on the default node pool, and I actually get a permadiff with it set to either "none"
or "static"
. Setting the value to the undocumented (but allowed by the validation) available empty string (""
).
node_config {
kubelet_config {
cpu_manager_policy = "none"
Yeah, came across this again when working on https://github.com/GoogleCloudPlatform/magic-modules/pull/11572
Basically, I think the issue is that the API default is "true", and doesn't send a response when the value is the default value: https://pkg.go.dev/google.golang.org/api/container/v1#NodeKubeletConfig
So, to really fix this, unless they want to do a breaking change and switch the attribute to string values (unlikely with 6.0 just having gone out), I'm guessing probably the provider needs to either disable force sending the field (and suppress a permadiff possibly?), and / or infer the default being true
when unset? I'm not sure what the side effects of disabling force send would be.
Is this showing up as a proposed change in the plan?
Also, the debug log had the following warnings...
2023-09-07T20:47:04.246Z [WARN] Provider "provider[\"registry.terraform.io/hashicorp/google\"]" produced an unexpected new value for google_container_cluster.primary, but we are tolerating it because it is using the legacy plugin SDK. The following problems may be the cause of any confusing errors from downstream operations: - .node_config[0].boot_disk_kms_key: was null, but now cty.StringVal("") - .node_config[0].node_group: was null, but now cty.StringVal("") - .node_config[0].kubelet_config[0].cpu_cfs_quota: was null, but now cty.False - .node_config[0].kubelet_config[0].cpu_cfs_quota_period: was null, but now cty.StringVal("")
This kind of warning doesn't usually indicate a real problem and is safe to ignore.
Is this showing up as a proposed change in the plan?
@melinath so I think one consequence of this is that if you set any (other) element in node_config.kubelet_config
, I believe this will force the value to false
vs. the API default of true
. Because in-place updates don't currently work (at least for the nested value), if the cluster is created without a kubelet_config and then one is added, there will be a permadiff where it will try to change the value. If we fix updates as has been discussed, this would make the situation worse.
Creating a cluster with the value explicitly set to true
or false
seems to work in my testing.
I haven't tested with a separate nodepool resource, but I would guess there's a similar (and maybe more risky) issue there?
Also, I think the provider is infering a value of false when the API doesn't return the value (because it's optional and default) when it's also true, but could be wrong there.
Edit: I'll try to post a practical example below that I think will help illustrate it.
Initial config:
resource "google_container_cluster" "test_cfs_quota" {
name = "test-cfs-quota"
location = "us-central1-f"
initial_node_count = 1
node_config {
kubelet_config {
# Other values not set here
cpu_manager_policy = "static"
}
}
deletion_protection = false
network = "default"
subnetwork = "default"
}
after applying:
% gcloud container clusters describe --location us-central1-f test-cfs-quota | yq .nodeConfig.kubeletConfig
cpuCfsQuota: false
cpuManagerPolicy: static
% gcloud container node-pools describe default-pool --cluster=test-cfs-quota --location=us-central1-f --flatten=config --format="value(kubeletConfig)"
cpuCfsQuota=False;cpuManagerPolicy=static
Note that cpuCfsQuota=False
despite it not being set in the config, and API default being True
. IMO, this is maybe the most problematic / unexpected behavior here. However, since we created the cluster this way, it will apply clean.
Now we have the permadiff from the update stuff not actually functioning
~ node_config {
tags = []
# (17 unchanged attributes hidden)
~ kubelet_config {
~ cpu_cfs_quota = false -> true
# (2 unchanged attributes hidden)
}
But now, let's destroy and recreate the cluster without anything in node_config.kubelet_config
so we get the default setting of unset (which is true
/ True
) for cpu_cfs_quota
:
resource "google_container_cluster" "test_cfs_quota" {
name = "test-cfs-quota"
location = "us-central1-f"
initial_node_count = 1
deletion_protection = false
network = "default"
subnetwork = "default"
}
Now we get:
% gcloud container clusters describe --location us-central1-f test-cfs-quota | yq .nodeConfig.kubeletConfig
null
% gcloud container node-pools describe default-pool --cluster=test-cfs-quota --location=us-central1-f --flatten=config --format="value(kubeletConfig)"
[no output]
I can't reproduce any other major issues there if other parameters are set and cpu_cfs_quota
is not, but if I add an explicit cpu_cfs_quota = true
:
node_config {
kubelet_config {
cpu_manager_policy = ""
cpu_cfs_quota = true
}
}
We'll end up with this permadiff, since the provider doesn't realize that unset is true
:
~ node_config {
tags = []
# (17 unchanged attributes hidden)
+ kubelet_config {
+ cpu_cfs_quota = true
}
# (1 unchanged block hidden)
It's fair that if #19225 were fixed, this situation would be somewhat better; I'm not sure if we'd end up with an implicit or explicit true value then.
Relevant contribution docs: https://googlecloudplatform.github.io/magic-modules/develop/permadiff/
Thanks, yes, have reviewed that semi-recently, but didn't think about it in relation to this
Are you suggesting to set the client-side default in this case as described here?:
"field": {
Default: "DEFAULT_VALUE",
}
to the field? And would this be breaking then?
or just to update the flattener like the example here
Maybe something like this (though I'm still seeing it false with this change) [edit: probably need to handle v
as well as c
here]:
diff --git a/mmv1/third_party/terraform/services/container/node_config.go.erb b/mmv1/third_party/terraform/services/container/node_config.go.erb
index 14ec2f469..0fe38be05 100644
--- a/mmv1/third_party/terraform/services/container/node_config.go.erb
+++ b/mmv1/third_party/terraform/services/container/node_config.go.erb
@@ -2,6 +2,7 @@
package container
import (
+ "reflect"
<% unless version == "ga" -%>
"strings"
<% end -%>
@@ -602,6 +603,7 @@ func schemaNodeConfig() *schema.Schema {
"cpu_cfs_quota": {
Type: schema.TypeBool,
Optional: true,
+ Computed: true,
Description: `Enable CPU CFS quota enforcement for containers that specify CPU limits.`,
},
"cpu_cfs_quota_period": {
@@ -1585,6 +1587,13 @@ func flattenSecondaryBootDisks(c []*container.SecondaryBootDisk) []map[string]in
return result
}
+func flattenCpuCfsQuota(c *container.NodeKubeletConfig) bool {
+ if c != nil && tpgresource.IsEmptyValue(reflect.ValueOf(c.CpuCfsQuota)) {
+ return true
+ }
+ return c.CpuCfsQuota
+}
+
func flattenInsecureKubeletReadonlyPortEnabled(c *container.NodeKubeletConfig) string {
// Convert bool from the API to the enum values used internally
if c != nil && c.InsecureKubeletReadonlyPortEnabled {
@@ -1742,7 +1751,7 @@ func flattenKubeletConfig(c *container.NodeKubeletConfig) []map[string]interface
result := []map[string]interface{}{}
if c != nil {
result = append(result, map[string]interface{}{
- "cpu_cfs_quota": c.CpuCfsQuota,
+ "cpu_cfs_quota": flattenCpuCfsQuota(c),
"cpu_cfs_quota_period": c.CpuCfsQuotaPeriod,
"cpu_manager_policy": c.CpuManagerPolicy,
"insecure_kubelet_readonly_port_enabled": flattenInsecureKubeletReadonlyPortEnabled(c),
Are you suggesting to set the client-side default to the field? And would this be breaking then?
I'm not making a specific recommendation, just noting that this seems to have a permadiff component. A change to the client-side default would be a breaking change that requires a major release. A change to Optional+Computed would not be breaking and might be appropriate in this case, though I haven't reviewed it closely.
If you're going with O+C, I think the custom flattener is unnecessary.
Ok, sounds good. I can try just the simplest way and see if it somehow magically Just Works.
diff --git a/mmv1/third_party/terraform/services/container/node_config.go.erb b/mmv1/third_party/terraform/services/container/node_config.go.erb
index 14ec2f469..cf9631eec 100644
--- a/mmv1/third_party/terraform/services/container/node_config.go.erb
+++ b/mmv1/third_party/terraform/services/container/node_config.go.erb
@@ -602,6 +602,7 @@ func schemaNodeConfig() *schema.Schema {
"cpu_cfs_quota": {
Type: schema.TypeBool,
Optional: true,
+ Computed: true,
Description: `Enable CPU CFS quota enforcement for containers that specify CPU limits.`,
},
"cpu_cfs_quota_period": {
@@ -1175,7 +1176,6 @@ func expandKubeletConfig(v interface{}) *container.NodeKubeletConfig {
}
if cpuCfsQuota, ok := cfg["cpu_cfs_quota"]; ok {
kConfig.CpuCfsQuota = cpuCfsQuota.(bool)
- kConfig.ForceSendFields = append(kConfig.ForceSendFields, "CpuCfsQuota")
}
if cpuCfsQuotaPeriod, ok := cfg["cpu_cfs_quota_period"]; ok {
kConfig.CpuCfsQuotaPeriod = cpuCfsQuotaPeriod.(string)
With ForceSendFields
it still sets it to the false
value, but with it removed, it seems to behave more closely to how I'd expect.
I'm not sure if there are cases where that change would cause an issue.
edit: It creates as I'd expect, but then it gets the values mixed up, as you also might expect.
For example, with the code above only, when the cluster was created with the default (true
) value implicitly, it shows up as a noop on subsequent plans if the value is set to false
in the config, and as a permadiff if it's set to true
.
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
cloudtop:~$ terraform version Terraform v1.5.6 on linux_amd64
Affected Resource(s)
google_container_node_pool google_container_cluster
Terraform Configuration Files
Debug Output
Panic Output
Expected Behavior
I would generally expect that any unset fields in configuration would not have a value in the cluster create request.
In this case, (refer to hcl above)
cluster.node_config.kubelet_config.cpu_cfs_quota
should not have resulted in that value not being set in the cluster create request, and resulting cluster should have had that value not set at all (set tonull
)Actual Behavior
However, the value of
cpu_cfs_quota
is set by the provider tofalse
in the cluster create request. The only way to avoid this outcome is to not setkubelet_config
at all (or forcecpu_cfs_quota
totrue
.Steps to Reproduce
terraform apply
Important Factoids
b/300067081