k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.65k stars 2.32k forks source link

agent fails to join masters due to undefined option `-disable` which isn't given #6565

Closed soletan closed 1 year ago

soletan commented 1 year ago

Environmental Info: K3s Version:

k3s version v1.25.4+k3s1 (0dc63334)
go version go1.19.3

Node(s) CPU architecture, OS, and Version:

Linux k1-w1 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Using custom script for setting up nodes.

  1. First master was set up: ./k3s-install.sh 10.0.1.1
  2. Second master: ./k3s-install.sh 10.0.1.2 10.0.1.1 <token>
  3. Third master: ./k3s-install.sh 10.0.1.3 10.0.1.1 <token>

Result:

NAME    STATUS   ROLES                       AGE   VERSION
k1-m1   Ready    control-plane,etcd,master   30m   v1.25.4+k3s1
k1-m2   Ready    control-plane,etcd,master   27m   v1.25.4+k3s1
k1-m3   Ready    control-plane,etcd,master   28m   v1.25.4+k3s1

Pinging master node from a worker node is working:

root@k1-w1:~# ping 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=63 time=1.47 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=63 time=0.471 ms

Describe the bug:

Setting up agent node is failing due to k3s agent complaining use of undefined option -disable which hasn't been given.

Steps To Reproduce:

Agent node is set up with

./k3s-install.sh 10.0.2.1 10.0.1.1 <token> $(pwgen 32 -1)

which is eventually invoking

curl -sfL https://get.k3s.io | \
            K3S_TOKEN="<token>" \
            K3S_AGENT_TOKEN="<output-of-pwgen>" \
            sh -s - agent \
            --node-ip="10.0.2.1" \
            --flannel-iface "enp7s0" \
            --server "https://10.0.1.1:6443"

Expected behavior:

I'd love to see the worker node joining the cluster.

Actual behavior:

k3s agent is complaining about use of undefined option -disable which isn't provided.

Additional context / logs:

journalctl -xeu k3s-agent:

Nov 28 01:34:12 k1-w1 sh[4055]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Nov 28 01:34:12 k1-w1 sh[4056]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Nov 28 01:34:12 k1-w1 k3s[4059]: Incorrect Usage: flag provided but not defined: -disable
Nov 28 01:34:12 k1-w1 k3s[4059]: NAME:
Nov 28 01:34:12 k1-w1 k3s[4059]:    k3s agent - Run node agent
Nov 28 01:34:12 k1-w1 k3s[4059]: USAGE:
Nov 28 01:34:12 k1-w1 k3s[4059]:    k3s agent [OPTIONS]
Nov 28 01:34:12 k1-w1 k3s[4059]: OPTIONS:
Nov 28 01:34:12 k1-w1 k3s[4059]:    --config FILE, -c FILE                     (config) Load configuration from FILE (default: "/etc/rancher/k3s/config.yaml") [$K3S_CONFIG_FILE]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --debug                                    (logging) Turn on debug logs [$K3S_DEBUG]
Nov 28 01:34:12 k1-w1 k3s[4059]:    -v value                                   (logging) Number for the log level verbosity (default: 0)
Nov 28 01:34:12 k1-w1 k3s[4059]:    --vmodule value                            (logging) Comma-separated list of FILE_PATTERN=LOG_LEVEL settings for file-filtered logging
Nov 28 01:34:12 k1-w1 k3s[4059]:    --log value, -l value                      (logging) Log to file
Nov 28 01:34:12 k1-w1 k3s[4059]:    --alsologtostderr                          (logging) Log to standard error as well as file (if set)
Nov 28 01:34:12 k1-w1 k3s[4059]:    --token value, -t value                    (cluster) Token to use for authentication [$K3S_TOKEN]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --token-file value                         (cluster) Token file to use for authentication [$K3S_TOKEN_FILE]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --server value, -s value                   (cluster) Server to connect to [$K3S_URL]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --data-dir value, -d value                 (agent/data) Folder to hold state (default: "/var/lib/rancher/k3s")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --node-name value                          (agent/node) Node name [$K3S_NODE_NAME]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --with-node-id                             (agent/node) Append id to node name
Nov 28 01:34:12 k1-w1 k3s[4059]:    --node-label value                         (agent/node) Registering and starting kubelet with set of labels
Nov 28 01:34:12 k1-w1 k3s[4059]:    --node-taint value                         (agent/node) Registering kubelet with set of taints
Nov 28 01:34:12 k1-w1 k3s[4059]:    --image-credential-provider-bin-dir value  (agent/node) The path to the directory where credential provider plugin binaries are located (default: "/var/lib/rancher/credentialprovider/bin")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --image-credential-provider-config value   (agent/node) The path to the credential provider plugin config file (default: "/var/lib/rancher/credentialprovider/config.yaml")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --selinux                                  (agent/node) Enable SELinux in containerd [$K3S_SELINUX]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --lb-server-port value                     (agent/node) Local port for supervisor client load-balancer. If the supervisor and apiserver are not colocated an additional port 1 less than this port will also be used for the apiserver client load-balancer. (default: 6444) [$K3S_LB_SERVER_PORT]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --protect-kernel-defaults                  (agent/node) Kernel tuning behavior. If set, error if kernel tunables are different than kubelet defaults.
Nov 28 01:34:12 k1-w1 k3s[4059]:    --container-runtime-endpoint value         (agent/runtime) Disable embedded containerd and use the CRI socket at the given path; when used with --docker this sets the docker socket path
Nov 28 01:34:12 k1-w1 k3s[4059]:    --pause-image value                        (agent/runtime) Customized pause image for containerd or docker sandbox (default: "rancher/mirrored-pause:3.6")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --snapshotter value                        (agent/runtime) Override default containerd snapshotter (default: "overlayfs")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --private-registry value                   (agent/runtime) Private registry configuration file (default: "/etc/rancher/k3s/registries.yaml")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --node-ip value, -i value                  (agent/networking) IPv4/IPv6 addresses to advertise for node (default: "10.0.2.1")
Nov 28 01:34:12 k1-w1 k3s[4059]:    --node-external-ip value                   (agent/networking) IPv4/IPv6 external IP addresses to advertise for node
Nov 28 01:34:12 k1-w1 k3s[4059]:    --resolv-conf value                        (agent/networking) Kubelet resolv.conf file [$K3S_RESOLV_CONF]
Nov 28 01:34:12 k1-w1 k3s[4059]:    --flannel-iface value                      (agent/networking) Override default flannel interface
Nov 28 01:34:12 k1-w1 k3s[4059]:    --flannel-conf value                       (agent/networking) Override default flannel config file
Nov 28 01:34:12 k1-w1 k3s[4059]:    --flannel-cni-conf value                   (agent/networking) Override default flannel cni config file
Nov 28 01:34:12 k1-w1 k3s[4059]:    --kubelet-arg value                        (agent/flags) Customized flag for kubelet process
Nov 28 01:34:12 k1-w1 k3s[4059]:    --kube-proxy-arg value                     (agent/flags) Customized flag for kube-proxy process
Nov 28 01:34:12 k1-w1 k3s[4059]:    --rootless                                 (experimental) Run rootless
Nov 28 01:34:12 k1-w1 k3s[4059]:    --docker                                   (agent/runtime) (experimental) Use cri-dockerd instead of containerd
Nov 28 01:34:12 k1-w1 k3s[4059]:
Nov 28 01:34:12 k1-w1 k3s[4059]: time="2022-11-28T01:34:12Z" level=fatal msg="flag provided but not defined: -disable"
Nov 28 01:34:12 k1-w1 systemd[1]: k3s-agent.service: Main process exited, code=exited, status=1/FAILURE

systemd unit looks like this:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    agent \
        '--node-ip=10.0.2.1' \
        '--flannel-iface' \
        'enp7s0' \
        '--server' \
        'https://10.0.1.1:6443' \

its env file looks like this:

K3S_AGENT_TOKEN='<output of pwgen>'
K3S_TOKEN='<token>'
soletan commented 1 year ago

I realized that "disable" option has been given, though this isn't quite that obvious. Agent node is fetching configuration file of master nodes. In my case, it was containing options which don't work on an agent (which is flannel-backend I guess, but maybe also advertise-address). After removing those two from the file for there are CLI arguments used for that, the setup succeeds.

Maybe worth adding, that the error message is misleading though, for having disable option in configuration file isn't an issue.