k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.71k stars 2.32k forks source link

CGROUPS not found on Raspberry Pi #8890

Closed teq0 closed 10 months ago

teq0 commented 10 months ago

Environmental Info: K3s Version: v1.27.3+k3s1

Node(s) CPU architecture, OS, and Version:

Linux k8s-pi-w08 6.1.0-rpi6-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.58-1+rpt2 (2023-10-27) aarch64 GNU/Linux Raspberry Pi 5 Raspberry Pi OS Lite 64 bit, latest version (Bookworm)

Cluster Configuration: Existing RPi cluster, 1 master 5 nodes

Describe the bug: Installing a new agent on RPi 5, it fails with

Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)

However cmdline already has the right settings, and cgroups2 exists (see output below).

Steps To Reproduce:

curl -sfL http://get.k3s.io | K3S_URL=https://<masterip>:6443 INSTALL_K3S_VERSION=v1.27.3+k3s1 K3S_TOKEN=<join-token> sh -

Expected behavior:

K3s agent added to cluster. Using exactly the same install process as the rest of the machines in the cluster, the only thing that's different is that it's a Raspberry Pi 5 and the lastest version of Raspberry Pi OS.

Actual behavior:

[INFO]  Using v1.27.3+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.27.3+k3s1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.27.3+k3s1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO]  Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)
[INFO]  systemd: Enabling k3s-agent unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO]  Host iptables-save/iptables-restore tools not found
[INFO]  Host ip6tables-save/ip6tables-restore tools not found
[INFO]  systemd: Starting k3s-agent
Job for k3s-agent.service failed because the control process exited with error code.
See "systemctl status k3s-agent.service" and "journalctl -xeu k3s-agent.service" for details.

[teq0@k8s-pi-w08 ~ ]$ cat /boot/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=3b97987c-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=AU cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory%      

[teq0@k8s-pi-w08 ~ ]$ grep cgroup /proc/filesystems
nodev   cgroup
nodev   cgroup2

[teq0@k8s-pi-w08 ~ ]$ uname -a      
Linux k8s-pi-w08 6.1.0-rpi6-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.58-1+rpt2 (2023-10-27) aarch64 GNU/Linux

And from journalctl

level=fatal msg="failed to find memory cgroup (v2)"

brandond commented 10 months ago

console=serial0,115200 console=tty1 root=PARTUUID=3b97987c-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=AU cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory%

Do you really have cgroup_enable=memory% instead of cgroup_enable=memory?

Also it looks like your node is set up to use cgroup v2; if fixing that kernel arg doesn't change anything, you might need to check the Pi OS docs on how to enable cgroups under this configuration.

teq0 commented 10 months ago

The '%' is an artifact of using cat to print out a file that doesn't have a carriage return at the end. If you edit the file there's no %.

I have 6 other machines set up exactly the same way. This has always been the way you enable cgroups on Raspberry Pis.

The error message says that cgroups aren't enabled, yet they clearly are.

brandond commented 10 months ago

The error message says that cgroups aren't enabled, yet they clearly are.

If they are properly enabled and running under the same controller then it would find them. They're either not enabled, or still managed by the v1 controller on a v2 system - which is known as a hybrid setup and doesn't work well.

What is the output of grep cgroup /proc/mounts and cat /proc/cgroups - here's what I see on my Pi:

sysadm@pi02:~$ uname -a
Linux pi02.lan.khaus 6.5.0-1006-raspi #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 23 12:57:46 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

sysadm@pi02:~$ grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0

sysadm@pi02:~$ cat /proc/cgroups
#subsys_name    hierarchy   num_cgroups enabled
cpuset  0   223 1
cpu 0   223 1
cpuacct 0   223 1
blkio   0   223 1
memory  0   223 1
devices 0   223 1
freezer 0   223 1
net_cls 0   223 1
perf_event  0   223 1
net_prio    0   223 1
hugetlb 0   223 1
pids    0   223 1
rdma    0   223 1
misc    0   223 1
teq0 commented 10 months ago
[teq0@k8s-pi-w08 ~ ]$ uname -a
Linux k8s-pi-w08 6.1.0-rpi6-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.58-1+rpt2 (2023-10-27) aarch64 GNU/Linux

[teq0@k8s-pi-w08 ~ ]$ grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0

[teq0@k8s-pi-w08 ~ ]$ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  0       41      1
cpu     0       41      1
cpuacct 0       41      1
blkio   0       41      1
memory  0       41      0
devices 0       41      1
freezer 0       41      1
net_cls 0       41      1
perf_event      0       41      1
net_prio        0       41      1
pids    0       41      1

Hmm, memory not enabled, apparently.

brandond commented 10 months ago

Indeed. Is this perhaps no longer enabled in the default pios kernels? That would be unfortunate, but also out of our control as it is required by Kubernetes.

teq0 commented 10 months ago

Thanks for your help. Closing this as it appears to be Raspberry Pi OS issue. I'll add a comment later if I find a solution.

teq0 commented 10 months ago

Solved it, the cmdline.txt that is actually used is now under /boot/firmware. Adding the extra setting to that one fixed it.

It turns out that the contents of /boot are copied to /boot/firmware on first boot. My setup scripts mostly run on the Pi after it's booted. I just moved the sed script that checks cmdline.txt so it runs after the SD card is burned before it's inserted in the Pi.

gnoejuan commented 10 months ago

Thank you. Really appreciate the follow up.

tbernacchi commented 2 months ago

Hi @teq0 - Maybe I'm facing the same issue here:

root@raspberrypi4-1:/boot# uname -a
Linux raspberrypi4-1 6.5.0-1013-raspi #16-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 14 13:46:12 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
root@raspberrypi4-1:/boot#
root@raspberrypi4-1:/boot# grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
none /run/calico/cgroup cgroup2 rw,relatime 0 0
none /run/cilium/cgroupv2 cgroup2 rw,relatime 0 0
root@raspberrypi4-1:/boot#
root@raspberrypi4-1:/boot# cat /proc/cgroups
#subsys_name    hierarchy   num_cgroups enabled
cpuset  0   190 1
cpu 0   190 1
cpuacct 0   190 1
blkio   0   190 1
memory  0   190 1
devices 0   190 1
freezer 0   190 1
net_cls 0   190 1
perf_event  0   190 1
net_prio    0   190 1
hugetlb 0   190 1
pids    0   190 1
rdma    0   190 1
misc    0   190 1
root@raspberrypi4-1:/# cat /boot/firmware/cmdline.txt
console=serial0,115200 multipath=off dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc cfg80211.ieee80211_regdom=GB

Did you add the cgroup_memory=1 cgroup_enable=memory into the cmdline.txt and then rebooted?

PS: Just for context I ended up here following/researching this https://github.com/cilium/cilium/issues/20735

teq0 commented 2 months ago

See my comment above. It wasn't anything to do K3s, it was that cmdline.txt is now in /boot/firmware. I was editing the wrong file.

tbernacchi commented 2 months ago

Yes I saw, but I don't think I get that straight. Specially the part with the SD card.