Closed gfrankliu closed 2 years ago
Check out the related discussion at https://github.com/rancher/rke2/discussions/2710
Moving the discussion from https://github.com/rancher/rke2/discussions/2710 back here.
Here is how to reproduce the issue:
On a clean Debian 11, install k3s using systemd cgroups driver using below command:
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --disable traefik --kubelet-arg cgroup-driver=systemd
The cluster comes up fine, but the default pods in kube-system namespace keeps restarting, and eventually in the CrashLoopBackOff
state in a few minutes:
$ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-96cc4f57d-rgk92 1/1 Running 0 2m53s
kube-system local-path-provisioner-84bb864455-jql9b 1/1 Running 2 (28s ago) 2m53s
kube-system metrics-server-ff9dbcb6c-zfhq9 1/1 Running 3 (23s ago) 2m53s
$ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-96cc4f57d-rgk92 1/1 Running 0 7m5s
kube-system local-path-provisioner-84bb864455-jql9b 0/1 CrashLoopBackOff 3 (38s ago) 7m5s
kube-system metrics-server-ff9dbcb6c-zfhq9 0/1 CrashLoopBackOff 5 (35s ago) 7m5s
As I mentioned in the other discussion, any logs or describe output showing why these pods are crashing would be helpful. Just showing that they are crashing isn't really enough to work with.
Sorry I thought the reproduce steps are good enough, but here are the logs. k3s.txt
I think I found the issue. I need to create config.toml.tmpl and add
[plugins.cri.containerd.runtimes.runc.options]
SystemdCgroup = true
Back to original feature request, it would be great if k3s installation can have a new flag for enabling systemd cgroup driver that can take care of those manual changes in config.toml and kubelet-arg.
I do encounter an issue when using config.toml.tmpl. In order to use /etc/rancher/k3s/registries.yaml, I added below to the config.toml.tmpl:
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors."{{$k}}"]
endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.cri.registry.configs."{{$k}}".auth]
{{ if $v.Auth.Username }}username = "{{ $v.Auth.Username }}"{{end}}
{{ if $v.Auth.Password }}password = "{{ $v.Auth.Password }}"{{end}}
{{ if $v.Auth.Auth }}auth = "{{ $v.Auth.Auth }}"{{end}}
{{ if $v.Auth.IdentityToken }}identitytoken = "{{ $v.Auth.IdentityToken }}"{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.cri.registry.configs."{{$k}}".tls]
{{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
{{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
{{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
{{end}}
{{end}}
{{end}}
The quotes and backslashes in the Password from registries.yaml aren't being escaped. Without using my own tmpl, k3s can properly escape the passwords from registries.yaml when creating the config.toml. How to properly fix the tmpl so that the Password can be escaped?
Are you using the template from the source code as a starting point?
That worked, thanks!
The link from https://rancher.com/docs/k3s/latest/en/advanced/ goes to a different template.
yeah, the template got split out into platform-specific files a while back; the docs just haven't been updated.
This turned out to be pretty easy to handle; with any luck the May releases will use the systemd cgroup driver automatically when possible.
Updating the docs https://github.com/rancher/docs/pull/4042
Assigning myself to work on the backporting PRs
I was able to reproduce the issue on k3s v1.23.6+k3s1 on Debian 11 on Linode using the steps to reproduce
# kubectl get pods -A -w
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system metrics-server-7cd5fcb6b7-9452v 1/1 Running 0 2m14s
kube-system local-path-provisioner-6c79684f77-hfqnj 1/1 Running 1 (87s ago) 2m14s
kube-system coredns-d76bd69b-f76mc 1/1 Running 1 (26s ago) 2m14s
kube-system local-path-provisioner-6c79684f77-hfqnj 0/1 Error 1 (101s ago) 2m28s
kube-system local-path-provisioner-6c79684f77-hfqnj 0/1 CrashLoopBackOff 1 (2s ago) 2m29s
kube-system local-path-provisioner-6c79684f77-hfqnj 1/1 Running 2 (19s ago) 2m46s
kube-system coredns-d76bd69b-f76mc 0/1 Completed 1 2m55s
kube-system coredns-d76bd69b-f76mc 0/1 CrashLoopBackOff 1 (2s ago) 2m56s
Validated fix on k3s v1.23.7-rc1+k3s1
# sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml |tail -10
SystemdCgroup = true
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-6c79684f77-pvcgn 1/1 Running 0 7m44s
kube-system coredns-d76bd69b-5z8np 1/1 Running 0 7m44s
kube-system metrics-server-7cd5fcb6b7-lxfbx 1/1 Running 0 7m44s
I finally got a chance to try out fresh installing the latest stable to confirm the fix on Debian 11 which has systemd 247:
$ curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
[INFO] Finding release for channel stable
[INFO] Using v1.23.8+k3s1 as release
/var/lib/rancher/k3s/agent/etc/containerd/config.toml still shows
[plugins.cri.containerd.runtimes.runc.options]
SystemdCgroup = false
I thought the installer will auto detect and set SystemdCgroup to true?
It will if systemd is compatible. The check is gated on k3s running under systemd, and the cpuset
cgroup being available. Are both of these true?
This is on a clean installation of Debian 11
koi@test-debian11:~$ apt list --installed | grep systemd
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
libpam-systemd/stable,now 247.3-7 amd64 [installed,automatic]
libsystemd0/stable,now 247.3-7 amd64 [installed]
systemd-sysv/stable,now 247.3-7 amd64 [installed]
systemd/stable,now 247.3-7 amd64 [installed]
koi@test-debian11:~$ cat /proc/$$/cpuset
/user.slice
That doesn't tell me whether or not the cpuset cgroup controller is delegated to the k3s service. When k3s is running, what's the output of cat /sys/fs/cgroup/system.slice/k3s.service/cgroup.controllers
?
$ cat /sys/fs/cgroup/system.slice/k3s.service/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
Another observation on Debian 11: there is no directory /sys/fs/cgroup/cpuset so I can't see any files in it, but if I manually create that directory, all files are immediately auto populated:
koi@test-debian11:~$ ls -l /sys/fs/cgroup/cpuset
ls: cannot access '/sys/fs/cgroup/cpuset': No such file or directory
koi@test-debian11:~$
koi@test-debian11:~$ sudo mkdir /sys/fs/cgroup/cpuset
koi@test-debian11:~$ ls -l /sys/fs/cgroup/cpuset
total 0
-r--r--r-- 1 root root 0 Jul 12 01:11 cgroup.controllers
-r--r--r-- 1 root root 0 Jul 12 01:11 cgroup.events
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.freeze
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.max.depth
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.procs
-r--r--r-- 1 root root 0 Jul 12 01:11 cgroup.stat
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.threads
-rw-r--r-- 1 root root 0 Jul 12 01:11 cgroup.type
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpu.max
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpu.pressure
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpuset.cpus
-r--r--r-- 1 root root 0 Jul 12 01:11 cpuset.cpus.effective
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpuset.cpus.partition
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpuset.mems
-r--r--r-- 1 root root 0 Jul 12 01:11 cpuset.mems.effective
-r--r--r-- 1 root root 0 Jul 12 01:11 cpu.stat
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpu.weight
-rw-r--r-- 1 root root 0 Jul 12 01:11 cpu.weight.nice
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.current
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.events
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.events.local
-rw-r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.max
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.rsvd.current
-rw-r--r-- 1 root root 0 Jul 12 01:11 hugetlb.1GB.rsvd.max
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.current
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.events
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.events.local
-rw-r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.max
-r--r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.rsvd.current
-rw-r--r-- 1 root root 0 Jul 12 01:11 hugetlb.2MB.rsvd.max
-rw-r--r-- 1 root root 0 Jul 12 01:11 io.max
-rw-r--r-- 1 root root 0 Jul 12 01:11 io.pressure
-r--r--r-- 1 root root 0 Jul 12 01:11 io.stat
-rw-r--r-- 1 root root 0 Jul 12 01:11 io.weight
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.current
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.events
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.events.local
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.high
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.low
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.max
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.min
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.numa_stat
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.oom.group
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.pressure
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.stat
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.swap.current
-r--r--r-- 1 root root 0 Jul 12 01:11 memory.swap.events
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.swap.high
-rw-r--r-- 1 root root 0 Jul 12 01:11 memory.swap.max
-r--r--r-- 1 root root 0 Jul 12 01:11 pids.current
-r--r--r-- 1 root root 0 Jul 12 01:11 pids.events
-rw-r--r-- 1 root root 0 Jul 12 01:11 pids.max
-r--r--r-- 1 root root 0 Jul 12 01:11 rdma.current
-rw-r--r-- 1 root root 0 Jul 12 01:11 rdma.max
koi@test-debian11:~$
Hmm, that's odd. Something doesn't sound quite right. Do you have cgroup v2 enabled, or is this v1 or hybrid? grep cgroup /proc/mounts
should indicate.
I have a fresh installation of Debian 11, which defaults to cgroups v2 based on the release notes . This could be disabled by kernel command line but I didn't do that.
koi@test-debian11:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.0-15-amd64 root=UUID=c30de53c-9fe1-4889-928a-48db7891cac4 ro quiet
koi@test-debian11:~$ grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
koi@test-debian11:~$
Maybe k3s check code doesn't work on Debian 11 ?
Here is how I did the k3s installation. Do I need to give any special options to help installer use cgroup v2?
$ curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
The only requirements for autodetecting this should be that you have the cpuset cgroup controller available, and that the Type
in the systemd unit is set to notify
(which should be the default), and results in enabling the systemd notification socket so that we know we're running under systemd. See if this makes any difference?
curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=notify sh -s - --write-kubeconfig-mode 644
Tried curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=notify sh -s - --write-kubeconfig-mode 644
and it doesn't make a difference, still sees SystemdCgroup = false
in /var/lib/rancher/k3s/agent/etc/containerd/config.toml Can you try it on a fresh Debian 11 and see if you can reproduce?
I do see cpuset cgroup controller
koi@test-debian11:~$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
BTW, I see below message in syslog, is it normal?
Jul 12 22:20:03 test-debian11 k3s[15318]: W0712 22:20:03.658470 15318 manager.go:159] Cannot detect current cgroup on cgroup v2
Hmm. Apologies, it looks like this was regressed by https://github.com/k3s-io/k3s/commit/a9b5a1933fb. On servers, the NOTIFY_SOCKET
environment variable gets unset, which prevents the cgroup detection code from detecting that it is running under systemd.
You can test the fix on your node with: curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=notify INSTALL_K3S_COMMIT=a2a5e79335c4a8c4d3f0038818ac0ef8b8403464 sh -s - --write-kubeconfig-mode 644
Tried fresh installation using curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=notify INSTALL_K3S_COMMIT=a2a5e79335c4a8c4d3f0038818ac0ef8b8403464 sh -s - --write-kubeconfig-mode 644
but the /var/lib/rancher/k3s/agent/etc/containerd/config.toml still shows SystemdCgroup = false
Did the install actually work? I don't believe CI is done yet for you to be able to install that commit. Try again shortly, and restart k3s after the install is successful.
I tested it from a local build and it does work.
Installation did actually work:
$ curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=notify INSTALL_K3S_COMMIT=a2a5e79335c4a8c4d3f0038818ac0ef8b8403464 sh -s - --write-kubeconfig-mode 644
[INFO] Using commit a2a5e79335c4a8c4d3f0038818ac0ef8b8403464 as release
[INFO] Downloading hash https://storage.googleapis.com/k3s-ci-builds/k3s-a2a5e79335c4a8c4d3f0038818ac0ef8b8403464.sha256sum
[INFO] Downloading binary https://storage.googleapis.com/k3s-ci-builds/k3s-a2a5e79335c4a8c4d3f0038818ac0ef8b8403464
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping installation of SELinux RPM
[INFO] Creating /usr/local/bin/kubectl symlink to k3s
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] Creating /usr/local/bin/ctr symlink to k3s
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service -> /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
Restarting didn't help. still shows SystemdCgroup = false
.
Just tried fresh install again, no difference.
What do you get from:
cat /sys/fs/cgroup/system.slice/k3s.service/cgroup.controllers
cat /proc/$(pgrep k3s)/environ | tr \\0 \\n | grep .
koi@test-debian11:~$ cat /sys/fs/cgroup/system.slice/k3s.service/cgroup.controllers
cpuset cpu io memory pids
koi@test-debian11:~$ sudo cat /proc/$(pgrep k3s)/environ | tr \\0 \\n | grep .
PATH=/var/lib/rancher/k3s/data/b1e4965bdf8b3b6087405f65958941d8de1e3cf92d70313b9f44b0dbe07c3001/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/rancher/k3s/data/b1e4965bdf8b3b6087405f65958941d8de1e3cf92d70313b9f44b0dbe07c3001/bin/aux
NOTIFY_SOCKET=/run/systemd/notify
INVOCATION_ID=9b421358b85246919cd915f7e2dcc3b7
JOURNAL_STREAM=8:1294321
RES_OPTIONS=
K3S_DATA_DIR=/var/lib/rancher/k3s/data/b1e4965bdf8b3b6087405f65958941d8de1e3cf92d70313b9f44b0dbe07c3001
koi@test-debian11:~$ ls -l /run/systemd/notify
srwxrwxrwx 1 root root 0 Jul 12 04:57 /run/systemd/notify
koi@test-debian11:~$
Hmm, so on your node systemd is not setting SYSTEMD_EXEC_PID
... I guess that wasn't added until v248-2 in March of 2021, but you're still on 247. Maybe INVOCATION_ID
is more reliable, that's been around since 232. It'd be nice if the docs said when those vars were added.
Try with c40c3620b77fd65aceea5188b547c987a5f7840f
v248 is very new. Debian 11 uses 247, Ubuntu 20.04 uses 245 and Redhat 8 uses 239.
c40c3620b77fd65aceea5188b547c987a5f7840f
works. Thanks for taking time to fix this!
Yeah, I'm on Ubuntu 22.04 which has 249. Glad that commit works for you! Today is upstream release day, so that commit won't make it into K3s until next month's releases.
Quote from containerd cgroup driver doc
On Debian 11, does k3s installation default to use systemd driver for containerd? If not, how to configure that?