k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.8k stars 2.33k forks source link

k3s on OpenWrt can't find CPU cgroup - but it is enabled. #9947

Closed IngwiePhoenix closed 3 months ago

IngwiePhoenix commented 5 months ago

Environmental Info: K3s Version: k3s/1.29.3+k3s1, go/1.21.0

Node(s) CPU architecture, OS, and Version: aarch64, OpenWrt, 23.05.2

Cluster Configuration: Configuration is below; right now, single-node.

Describe the bug: I was trying to get k3s running on my NanoPi R6s by basically doing this:

It is kind of a dupe of #8971 - however, I did verify if cgroups had the CPU enabled:

# cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  1       2       1
cpu     2       2       1
cpuacct 3       2       1
blkio   4       2       1
memory  5       2       1
devices 6       2       1
freezer 7       2       1
net_cls 8       2       1
perf_event      9       2       1
net_prio        10      2       1
hugetlb 11      2       1
pids    12      2       1

Steps To Reproduce: See above; but also, here is the contents of my config.

# find . -type f | while read f; echo "### $f"; cat $f; end
### ./config.yaml.d/node-labels.yaml
node-label:
  - node-location=home
### ./config.yaml.d/storage.yaml
default-local-storage-path: "/srv/k3s"
### ./config.yaml.d/node-mame.yaml
node-name: "routerboi"
### ./config.yaml.d/node-ip.yaml
# node-ip:
node-external-ip: 100.64.0.2
### ./config.yaml.d/data-dir.yaml
data-dir: "/usb/k3s"
### ./config.yaml
log: "/var/log/k3s.log"
token: <snip>
write-kubeconfig-mode: 600
cluster-init: true
cluster-domain: "kube.birb.it"
flannel-external-ip: true
etcd-snapshot-compress: true
secrets-encryption: true
# TODO: egress-selector-mode: cluster?
# TODO: server: <tailscale ip>
# TODO: vpn-auth: "name=tailscale,joinKey=<headscale auth key>,controlServerURL=https://vpnurl"
### ./k3s.yaml
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <snip>
    server: https://127.0.0.1:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    client-certificate-data: <snip>
    client-key-data: <snip>

Expected behavior: I expected k3s to stay running and not crash.

Actual behavior:

(...)
time="2024-04-14T11:24:19Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/usb/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/usb/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/usb/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/usb/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/usb/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/usb/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/usb/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/usb/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/usb/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/usb/k3s/server/tls/server-ca.key --configure-cloud-routes=false --controllers=*,tokencleaner,-service,-route,-cloud-node-lifecycle --kubeconfig=/usb/k3s/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/usb/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/usb/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true"
I0414 11:24:19.603254   19366 options.go:222] external host was not specified, using 100.64.0.2
time="2024-04-14T11:24:19Z" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/usb/k3s/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/usb/k3s/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-config=/usb/k3s/server/etc/cloud-config.yaml --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --controllers=*,-route --feature-gates=CloudDualStackNodeIPs=true --kubeconfig=/usb/k3s/server/cred/cloud-controller.kubeconfig --leader-elect-resource-name=k3s-cloud-controller-manager --node-status-update-frequency=1m0s --profiling=false"
time="2024-04-14T11:24:19Z" level=info msg="Server node token is available at /usb/k3s/server/token"
time="2024-04-14T11:24:19Z" level=info msg="To join server node to cluster: k3s server -s https://...:6443 -t ${SERVER_NODE_TOKEN}"
time="2024-04-14T11:24:19Z" level=info msg="Agent node token is available at /usb/k3s/server/agent-token"
time="2024-04-14T11:24:19Z" level=info msg="To join agent node to cluster: k3s agent -s https://...:6443 -t ${AGENT_NODE_TOKEN}"
I0414 11:24:19.606557   19366 server.go:156] Version: v1.29.3+k3s1
I0414 11:24:19.606662   19366 server.go:158] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
time="2024-04-14T11:24:19Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
time="2024-04-14T11:24:19Z" level=info msg="Run: k3s kubectl"
time="2024-04-14T11:24:19Z" level=fatal msg="failed to find cpu cgroup (v2)"

Additional context / logs: This OpenWrt node, a FriendlyElec NanoPi R6s, is suppsoed to be one of two CPs, the other one is a remote VPS. That is why it has this odd label in the config; I plan to use that for nodeSelector. The image is technically vendor supplied; it's shipped by FriendlyElec. containerd and alike are installed and working just fine, so I am a little surprised why k3s isn't.

Hope you can help me here, thank you!

IngwiePhoenix commented 5 months ago

Might be relevant; here's the kernel config regarding cgroups:

# zcat /proc/config.gz | grep -i cgroup
CONFIG_CGROUPS=y
# CONFIG_CGROUP_FAVOR_DYNMODS is not set
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_BLK_CGROUP_RWSTAT=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
# CONFIG_BLK_CGROUP_IOPRIO is not set
# CONFIG_BFQ_CGROUP_DEBUG is not set
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NET_CLS_CGROUP=m
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
IngwiePhoenix commented 5 months ago

Launching with --debug did not reveal anything, so I went and did the classic grep -r to debug this.

So, I came across this: https://github.com/k3s-io/k3s/blob/81cd630f87ba3c0c720862af4cd02850303083a5/pkg/cgroups/cgroups_linux.go#L129-L137

I verified that this snippet works correctly from a Fish shell as root:

root@FriendlyWrt /s/f/c/cpu# cat /proc/self/cgroup
13:name=systemd:/
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/
0::/services/dropbear/instance1
root@FriendlyWrt /s/f/c/cpu# cat cpu.cfs_period_us
100000
root@FriendlyWrt /s/f/c/cpu# realpath cpu.cfs_period_us
/sys/fs/cgroup/cpu/cpu.cfs_period_us

Thus, this should be passing. And, just to make sure, I verified what /proc/1/cgroup returns:

 cat /proc/1/cgroup
13:name=systemd:/
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/
0::/

Seems to me like this should pass too.

Tracking further, right above, I spot this: https://github.com/k3s-io/k3s/blob/81cd630f87ba3c0c720862af4cd02850303083a5/pkg/cgroups/cgroups_linux.go#L69-L73

That seems wrong. After all, I can absolutely see the required resource as demonstrated above. I also verified the others.

# find . -type d -name cpu -o -name cpuset -o -name memory | xargs realpath
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/memory

So, I dug into the dependency used here, cgroupsv2.NewManager(...). Most intriguing is this: if _, ok := m[controller]; !ok. It should technically evaluate to something akin to if _, ok := m["cpu"]; !ok -> if _, ok := struct{}; !ok. I am not a Go expert by any means, but I am going to assume that the secondary variable (ok) just holds an error if an assignment to the throw-away var is not possible. Thus, this is more of an x in map check. Alright then, the map is built a little above via .RootControllers().

Well, I ripped the function out and turned it into a tiny example thingything and found out what's up: Code:

package main

import (
  "fmt"
  cgroupsv2 "github.com/containerd/cgroups/v3/cgroup2"
)

// ripped: https://github.com/k3s-io/k3s/blob/81cd630f87ba3c0c720862af4cd02850303083a5/pkg/cgroups/cgroups_linux.go#L56-L75
func validateCgroupsV2() error {
        manager, err := cgroupsv2.NewManager("/sys/fs/cgroup", "/", &cgroupsv2.Resources{})
        if err != nil {
                fmt.Println("-> cgroupsv2 returned error")
                fmt.Println(err)
                return err
        }
        controllers, err := manager.RootControllers()
        if err != nil {
                fmt.Println("-> .RootControllers() returned error")
                fmt.Println(err)
                return err
        }
        m := make(map[string]struct{})
        for _, controller := range controllers {
                fmt.Printf("-> l1: Adding %s", controller)
                m[controller] = struct{}{}
        }
        for _, controller := range []string{"cpu", "cpuset", "memory"} {
                fmt.Printf("-> l2: Step %s", controller)
                if _, ok := m[controller]; !ok {
                        return fmt.Errorf("failed to find %s cgroup (v2)", controller)
                }
        }
        return nil
}

func main() {
        var err = validateCgroupsV2()
        if err == nil {
                fmt.Println("-> main: Not passing!")
        } else {
                fmt.Println("-> main: Passing")
        }
}

Result:

# /usb/cgtest/cgtest
-> l2: Step %s cpu
-> main: Passing

So, the first loop never runs, and the second only steps through CPU, and errors out. o.o huh?! I went to dig deeper into the cgroups in my OpenWrt installation to see what's up:

# find . -maxdepth 1 -type f | while read f; echo "### $f"; cat $f; end
### ./cgroup.procs
1
<... long list of numbers ...>
28217
### ./cgroup.max.descendants
max
### ./cpu.stat
usage_usec 477235660000
user_usec 296061090000
system_usec 181174570000
### ./cgroup.stat
nr_descendants 78
nr_dying_descendants 0
### ./cgroup.threads
1
<... long list of numbers ...>
28217
### ./cgroup.controllers
### ./cgroup.subtree_control
### ./cgroup.max.depth
max

The three empty ones at the bottom caught my attention, so I checked on other maschines:

# Remote VPS, Ubuntu 20.04 LTS
# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

# Local, RockPro64, Debian 12
# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma

# Local, VisionFive2, RISC-V, Debian Rolling ("sid/trixie")
# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

So. Apparently this could be OpenWrt's fault. I am not exactly a buff when it comes to cgroups, but this is such an edge-case and weird issue.

Is there anything I can do there...? Because at this point, I am but massively confused. o.o

brandond commented 5 months ago

Don't use hybrid cgroups, where some are owned by the v1 controller and others are v2. I don't know how to configure this on openwrt.

See the discussion at https://github.com/k3s-io/k3s/issues/8890#issuecomment-1815634890

github-actions[bot] commented 4 months ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.