k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.62k stars 2.24k forks source link

Empty string as a parameter in etcd extra args #10012

Closed jordanorc closed 3 weeks ago

jordanorc commented 3 weeks ago

Environmental Info:

K3s Version: v1.29.3+k3s1

Node(s) CPU architecture, OS, and Version: Linux test4 5.15.0-92-generic 102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux. (Ubuntu 22.04)

Cluster Configuration: 1 server

Describe the bug:

Pull request 4463 (https://github.com/k3s-io/k3s/pull/4463/) made it possible to use additional parameters for etcd. In scenarios where it is needed to override a parameter with an empty string, k3s understands that the parameter is an array and not an empty string. The problem happens on this line: https://github.com/k3s-io/k3s/blob/94e29e2ef5d79904f730e2024c8d1682b901b2d5/pkg/daemons/executor/executor.go#L106C5-L106C83.

Steps To Reproduce:

  1. Start a new K3s server with the command:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable=servicelb --cluster-init --etcd-arg=discovery=https://discovery.etcd.io/testing --etcd-arg=initial-cluster= sh -

As initial-cluster is an empty string, we expect it overrides the default value (if it exists), once the empty string will be ignored by etcd (https://github.com/etcd-io/etcd/blob/c2a3ca62c707f9c9d662ba94335fc019726c3bfd/server/embed/config.go#L916). But, instead of that, we have the following error:

starting kubernetes: preparing server: start managed database: multiple discovery or bootstrap flags are set. Choose one of "initial-cluster", "discovery" or "discovery-srv"

Expected behavior: It's expected the parameter initial-cluster will be an empty string.

Actual behavior: initial-cluster is an empty array

Additional context / logs: It happens because the ToConfigFile function (https://github.com/etcd-io/etcd/blob/c2a3ca62c707f9c9d662ba94335fc019726c3bfd/server/embed/config.go#L916) treats the value of the parameter as an array. So, if we set it as null, we have an empty array as return. If we set it as an empty string ("" or '') we have a '' or "" literally as a string.

brandond commented 3 weeks ago

--etcd-arg=discovery=https://discovery.etcd.io/testing --etcd-arg=initial-cluster=

Don't do that. We do not support external management of the etcd cluster creation or member discovery process. If you are using the embedded etcd, you must let k3s itself manage etcd. If you want to manage the cluster creation / membership discovery by hand, use standalone etcd, and point k3s at that cluster with the --datastore-endpoint option.

jordanorc commented 3 weeks ago

Hi @brandon, thank's for the quick response. Actually, we are having this issue using Rancher to automatically provisioning RKE2 clusters. After some hours of search, we discovered the problem is actually in k3s. We need to override some parameters of etcd (as pointed here: https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration#machineglobalconfig), and that was just a example of the problem. All parameters in etcd-arg that need to be override with an empty string suffer the same issue. There is any workaround for this?

brandond commented 3 weeks ago

What are the actual fields in the config that you're trying to override?

jordanorc commented 3 weeks ago

It will happen with any field that is an empty string by default, such as listen-metrics-urls, wal-dir or tls-max-version. In any of these cases, if you try to override the value with the default one (an empty string), the error will be the same: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal array into Go struct field configYAML.listen-metrics-urls of type string, because the way k3s deal with empty strings here: https://github.com/etcd-io/etcd/blob/c2a3ca62c707f9c9d662ba94335fc019726c3bfd/server/embed/config.go#L916. In our case, we try to override listen-metrics-urls and initial-cluster. Any of the commands below will result in the same error:

k3s server --cluster-init --etcd-arg=listen-metrics-urls= k3s server --cluster-init --etcd-arg=tls-max-version= k3s server --cluster-init --etcd-arg=wal-dir=

brandond commented 3 weeks ago

I can take a look at improving handling of that, but can I ask why you're trying to set those to empty values? I could understand wanting to set no metrics URLs to disable serving metrics, but why would you set the WAL dir to an empty value? Similarly, tls-max-version wants TLS1.2 or, TLS1.3, not an empty string.

Are you intending to unset these flags by setting them to an empty value? Thats not how the flag parser works.

jordanorc commented 3 weeks ago

Some of these parameters are changed by Rancher during the process of provisioning a new cluster (we are using Harvester + Rancher). I'm testing some scenarios in our infrastructure where it's needed to keep the default etcd value (empty string). A simple workaround is change the https://github.com/k3s-io/k3s/blob/94e29e2ef5d79904f730e2024c8d1682b901b2d5/pkg/daemons/executor/executor.go#L106C5-L106C83 file, validating the size of the string, changing it from:

} else if err := yaml.Unmarshal([]byte(extraArg[1]), &stringArr); err == nil {

to:

} else if err := yaml.Unmarshal([]byte(extraArg[1]), &stringArr); err == nil && (len(extraArg[1]) > 0) {

I can make a pull request if needed. A better improvement would be to use the etcd lib to parse the parameters. 

jordanorc commented 3 weeks ago

@brandond see pull request: https://github.com/k3s-io/k3s/pull/10018

brandond commented 3 weeks ago

Some of these parameters are changed by Rancher during the process of provisioning a new cluster (we are using Harvester + Rancher). I'm testing some scenarios in our infrastructure where it's needed to keep the default etcd value (empty string).

I think that might be the source of some confusion - the default isn't an empty value, and for the other --COMPONENT-arg flags, passing through an arg with an empty value won't restore the default. What you're asking for is the ability to unset or mask a default value that is set internally by k3s. That isn't currently possible.

We can continue conversation on the PR.