k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.82k stars 2.33k forks source link

Nodes Can't Be Named w/ Underscores #10794

Closed JackEstes06 closed 1 month ago

JackEstes06 commented 1 month ago

Environmental Info: K3s Version: 1.30.4+k3s1

Didn't run before I fixed the issue :(

Node(s) CPU architecture, OS, and Version:

-Linux rpimaster 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux -Linux rpinode1 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux -Linux rpinode2 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux

Cluster Configuration:

1 server, 2 agents

Describe the bug:

When connecting a node/agent to the server/master, naming it with an _ causes a failure.

Steps To Reproduce:

Expected behavior:

Normal install and k3s-agent launch

Actual behavior:

Job for k3s-agent.service failed because the control process exited with error code. See "systemctl status k3s-agent.service" and "journalctl -xeu k3s-agent.service" for details. Commands were run and are in additional context, but the crux of the issue was that an _ in the node name was causing an issue with registering it in the server.

Additional context / logs:

ISSUE FIX: Change K3S_NODE_NAME to one with no underscores (ie RPi_Node2 -> RPiNode2)

Logs below for clarification as to how I found the issue (it was inside journalctl) root@rpinode1:~# sudo systemctl status k3s-agent ● k3s-agent.service - Lightweight Kubernetes Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; preset: enabled) Active: activating (start) since Mon 2024-09-02 08:27:12 BST; 3s ago Docs: https://k3s.io Process: 11861 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS) Process: 11863 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS) Process: 11864 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) Main PID: 11865 (k3s-agent) Tasks: 18 Memory: 62.3M CPU: 1.064s CGroup: /system.slice/k3s-agent.service ├─11865 "/usr/local/bin/k3s agent" └─11879 "containerd "

Sep 02 08:27:13 rpinode1 k3s[11865]: time="2024-09-02T08:27:13+01:00" level=info msg="Starting k3s agent v1.30.4+k3s1 (98262b5d)" Sep 02 08:27:13 rpinode1 k3s[11865]: time="2024-09-02T08:27:13+01:00" level=info msg="Adding server to load balancer k3s-agent-load-balancer: 192.168.1.217:6443" Sep 02 08:27:13 rpinode1 k3s[11865]: time="2024-09-02T08:27:13+01:00" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [192.168.1.217:6443] [default: 192.168.1.217:6443]" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Module overlay was already loaded" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Module nf_conntrack was already loaded" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Module br_netfilter was already loaded" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Module iptable_nat was already loaded" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Module iptable_filter was already loaded" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log" Sep 02 08:27:15 rpinode1 k3s[11865]: time="2024-09-02T08:27:15+01:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state>

root@rpinode1:~# journalctl -xeu k3s-agent.service Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.622479 11918 kubelet.go:2356] "Starting kubelet main sync loop" Sep 02 08:27:53 rpinode1 k3s[11918]: E0902 08:27:53.622644 11918 kubelet.go:2380] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg ha> Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.632457 11918 cpu_manager.go:214] "Starting CPU manager" policy="none" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.632500 11918 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.632627 11918 state_mem.go:36] "Initialized new in-memory state store" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.633186 11918 state_mem.go:88] "Updated default CPUSet" cpuSet="" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.633242 11918 state_mem.go:96] "Updated CPUSet assignments" assignments={} Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.633316 11918 policy_none.go:49] "None policy: Start" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.635366 11918 memory_manager.go:170] "Starting memorymanager" policy="None" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.635455 11918 state_mem.go:35] "Initializing new in-memory state store" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.636127 11918 state_mem.go:75] "Updated machine memory state" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.648562 11918 manager.go:479] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.649050 11918 container_log_manager.go:186] "Initializing container log rotate workers" workers=1 monitorPeriod="10s" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.649426 11918 plugin_manager.go:118] "Starting Kubelet Plugin Manager" Sep 02 08:27:53 rpinode1 k3s[11918]: E0902 08:27:53.652937 11918 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"rpi_node1\" not found" Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.692244 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:53 rpinode1 k3s[11918]: E0902 08:27:53.704258 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.723117 11918 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="18792492fc1e036174a63fc8638b3079562a8865b3cb4adc924f3e481> Sep 02 08:27:53 rpinode1 k3s[11918]: I0902 08:27:53.906053 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:53 rpinode1 k3s[11918]: E0902 08:27:53.912892 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:27:54 rpinode1 k3s[11918]: E0902 08:27:54.125558 11918 csi_plugin.go:312] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rp> Sep 02 08:27:54 rpinode1 k3s[11918]: time="2024-09-02T08:27:54+01:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrac> Sep 02 08:27:54 rpinode1 k3s[11918]: E0902 08:27:54.159060 11918 server.go:1051] "Failed to retrieve node info" err="nodes \"rpi_node1\" not found" Sep 02 08:27:54 rpinode1 k3s[11918]: I0902 08:27:54.314539 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:54 rpinode1 k3s[11918]: E0902 08:27:54.320967 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:27:54 rpinode1 k3s[11918]: E0902 08:27:54.529592 11918 csi_plugin.go:312] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rp> Sep 02 08:27:54 rpinode1 k3s[11918]: I0902 08:27:54.579067 11918 apiserver.go:52] "Watching apiserver" Sep 02 08:27:54 rpinode1 k3s[11918]: I0902 08:27:54.592504 11918 desired_state_of_world_populator.go:157] "Finished populating initial desired state of world" Sep 02 08:27:54 rpinode1 k3s[11918]: E0902 08:27:54.995199 11918 csi_plugin.go:312] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rp> Sep 02 08:27:55 rpinode1 k3s[11918]: I0902 08:27:55.124676 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:55 rpinode1 k3s[11918]: E0902 08:27:55.131585 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:27:55 rpinode1 k3s[11918]: E0902 08:27:55.212953 11918 server.go:1051] "Failed to retrieve node info" err="nodes \"rpi_node1\" not found" Sep 02 08:27:55 rpinode1 k3s[11918]: E0902 08:27:55.917238 11918 csi_plugin.go:312] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rp> Sep 02 08:27:56 rpinode1 k3s[11918]: I0902 08:27:56.735320 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:56 rpinode1 k3s[11918]: E0902 08:27:56.742300 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:27:57 rpinode1 k3s[11918]: E0902 08:27:57.356881 11918 server.go:1051] "Failed to retrieve node info" err="nodes \"rpi_node1\" not found" Sep 02 08:27:59 rpinode1 k3s[11918]: E0902 08:27:59.872930 11918 csi_plugin.go:312] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rp> Sep 02 08:27:59 rpinode1 k3s[11918]: I0902 08:27:59.946077 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:27:59 rpinode1 k3s[11918]: E0902 08:27:59.957608 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:28:01 rpinode1 k3s[11918]: E0902 08:28:01.539470 11918 server.go:1051] "Failed to retrieve node info" err="nodes \"rpi_node1\" not found" Sep 02 08:28:03 rpinode1 k3s[11918]: E0902 08:28:03.444792 11918 resource_metrics.go:161] "Error getting summary for resourceMetric prometheus endpoint" err="failed to get node info: node \"rpi_node1\" not > Sep 02 08:28:03 rpinode1 k3s[11918]: E0902 08:28:03.653735 11918 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"rpi_node1\" not found" Sep 02 08:28:04 rpinode1 k3s[11918]: E0902 08:28:04.023110 11918 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"rpi_node1\" not found" node="rpi_node1" Sep 02 08:28:06 rpinode1 k3s[11918]: I0902 08:28:06.361203 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:28:06 rpinode1 k3s[11918]: E0902 08:28:06.368419 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:28:10 rpinode1 k3s[11918]: E0902 08:28:10.684798 11918 server.go:1051] "Failed to retrieve node info" err="nodes \"rpi_node1\" not found" Sep 02 08:28:13 rpinode1 k3s[11918]: I0902 08:28:13.371900 11918 kubelet_node_status.go:73] "Attempting to register node" node="rpi_node1" Sep 02 08:28:13 rpinode1 k3s[11918]: E0902 08:28:13.380265 11918 kubelet_node_status.go:96] "Unable to register node with API server" err="Node \"rpinode1\" is invalid: metadata.name: Invalid value: \"rpi> Sep 02 08:28:13 rpinode1 k3s[11918]: E0902 08:28:13.653950 11918 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"rpi_node1\" not found" Sep 02 08:28:14 rpinode1 k3s[11918]: E0902 08:28:14.342502 11918 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"rpi_node1\" not found" node="rpi_node1"

brandond commented 1 month ago

You're correct, they cannot. Don't do that.

https://kubernetes.io/docs/concepts/overview/working-with-objects/names/