Closed yaroslavkasatikov closed 1 year ago
https://dropmefiles.com/6nn0s must-gather
Also noticed this on kubelet 1.24 and the situation gets worse the more pods are running.
Its not OKD specific, seems to be a dupe of https://issues.redhat.com/browse/OCPBUGSM-39381
This is most likely a runc regression. Could you check if switching to runc crun helps (make sure you apply it to all MachineConfigPools). Example MachineConfig
Seems we'd need runc 1.1.3, which has two systemd/cgroups fixes
This is most likely a runc regression. Could you check if switching to runc helps (make sure you apply it to all MachineConfigPools). Example MachineConfig
Seems we'd need runc 1.1.3, which has two systemd/cgroups fixes
@vrutkovs
Hi, Vadim! I have applied your machineconfig and recreate cron nodes.
Will report about results
@vrutkovs Hi Vadim,
Seems it hasn't helped. Worked fine for 3h, but now one node returned to failed state:
---- ------ ---- ---- -------
Normal Scheduled 4m55s default-scheduler Successfully assigned bank-prod/bank-cronjob-27674577-7wjld to ip-10-0-217-73.eu-central-1.compute.internal by ip-10-0-133-210
Warning FailedCreatePodContainer 3m16s kubelet unable to ensure pod container exists: failed to create container for [kubepods burstable pod103107db-62e6-4f67-b2fd-50a066f4af36] : Timeout waiting for systemd to create kubepods-burstable-pod103107db_62e6_4f67_b2fd_50a066f4af36.slice
Seems that kube-rbac-proxy began to eat memory before the case.
Okay, it looks like kubelet leaking memory, not runc
The same issue here. Process: root 2009 1 73 Aug12 ? 3-00:33:07 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=fedora --node-ip=192.168.110.111 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --hostname-override= --provider-id= --pod-infra-container-image=quay.io/openshift/okd-content@sha256:c4e32171a302b1a0d21f936b795b9505b992404b6335bb7e63d3b1bddc0b91ab --system-reserved=cpu=500m,memory=1Gi --v=2
Hi, Also experiencing the same issue in multiple test 4.11 clusters, we have ~30 production 4.10 clusters and have not seen this issue there at all.
root 1458 74.9 72.8 22674508 11924036 ? Ssl Aug02 14736:44 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=fedora --node-ip=10.68.0.64 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --hostname-override= --provider-id= --pod-infra-container-image=quay.io/openshift/okd-content@sha256:c4e32171a302b1a0d21f936b795b9505b992404b6335bb7e63d3b1bddc0b91ab --system-reserved=cpu=500m,memory=1Gi --v=2
The node with the memory increase starting ~2022-08-17 07:30 has a cronjob pinned to it, created at 2022-08-16 07:26, which runs every minute to echo "hello openshift" with busybox, to test if this was related to frequently running cronjobs.
Same errors are seen as original post whilst kubelet memory is increasing e.g.
pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort podacd321bc-4db6-468e-87d8-1e1e16c85a9a] err="unable to destroy cgroup paths for cgroup [kubepods besteffort podacd 321bc-4db6-468e-87d8-1e1e16c85a9a] : Timed out while waiting for systemd to remove kubepods-besteffort-podacd321bc_4db6_468e_87d8_1e1e16c85a9a.slice"
Switched to crun and not seeing staggering the memory growth here. Its probably visible on nodes with many containers running?
Switched to crun and not seeing staggering the memory growth here. Its probably visible on nodes with many containers running?
As for me, I have switched to crun when you wrote and haven't rolled it back
I rewrote the application for removing k8s cronjobs (packed them into container with crontab) and cluster started to be stable. So it seems the reason not in the number of running pods, but in 'starting' pods.
Also I noticed that when I had a lot of cronjobs the node degradation went for this way (from k8s side): 1) Normal state: container scheduled to the node and started in 1-3s 2) Degradation started: container scheduled to the node and started in 20-60s. 3) Degraded state: container scheduled to the node and stuck. 4) After some time the affected node changes a status to NotReady. Pods eviction started. All pods changed their state to Terminating. This state can be fixed only if you hard reboot the node or remove machine.
on steps 1, 2 and 3 all running containers work fine on this node.
It seems there are fixes related to this problem in the kubernetes v1.24.4 releases notes
Specificaly:
- Fix JobTrackingWithFinalizers when a pod succeeds after the job is considered failed, which led to API conflicts that blocked finishing the job. (https://github.com/kubernetes/kubernetes/pull/111664, @alculquicondor) [SIG Apps and Testing]
- Fix memory leak in the job controller related to JobTrackingWithFinalizers (https://github.com/kubernetes/kubernetes/pull/111722, @alculquicondor) [SIG Apps]
- Fixed potential scheduler crash when scheduling with unsatisfied nodes in PodTopologySpread. (https://github.com/kubernetes/kubernetes/pull/111511, @kerthcet) [SIG Scheduling]
Currently OKD v4.11.0-0.okd-2022-07-29-154152 uses kubernetes v1.24.0+9546431
Excellent, thanks. There's a PR open for 1.24.3 - https://github.com/openshift/kubernetes/pull/1326 - hopefully it would be soon superceded with .4 and merged soon. Once it happens we'll pick it up in okd-machine-os build and release a new stable
https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.11.0-0.okd-2022-08-20-022919 includes runc 1.1.3, which should have some kubelet-related fixes.
cronjob pinned to one node to scale an alpine deployment between 50 and 0 replicas and reverse, every minute, this test shows the problem within a few hours on 4.11 clusters. no problem found on 4.10 clusters over multiple days.
Screenshot above is from 4.11 cluster, crun machineconfig was added to workers 2022-08-21 (confirmed with crio-status config), Updated cluster to 4.11.0-0.okd-2022-08-20-022919 at ~2022-08-21 12:00, crun machineconfig was removed from workers, rendered config shows creation 21 Aug 2022, 20:43, neither crun, nor runc version 1.1.3 seem to have had effect on kubelet problem.
Kubelet version in 4.11.0-0.okd-2022-08-20-022919 now v1.24.0+4f0dd4d
Below screenshot from 4.10 cluster running same cronjob since 2022-08-19 TLDR: New release doesn't fix the issue
It also seem to happen with AWX 21.x starting Ansible playbooks as pods. I see this every night as a lot of jobs start after midnight. A node reboot fixed the memory usage. Also, no problems running AWX on okd 4.10 clusters.
1.24.4 bump - https://github.com/openshift/kubernetes/pull/1352
By the way, Kubelet consumes also more and more CPU usage in the long run on OKD Version 4.10.0-0.okd-2022-07-09-073606 (vSphere IPI). You can easily reproduce if you are using for example "Red Hat OpenShift Logging" Operator, because it continuously executes cron tasks. But of course, the CPU load of Kubelet is nowhere near as severe as with the version OKD Version 4.11.0-0.okd-2022-08-20-022919. I think the CPU usage of Kubelet also very depended of how much pods are deployed within the cluster/node. (Maybe the overhead of vSphere IPI certainly plays a part in this as well. Compared to the bare metal installation.) After restarting Kubelet on a node, the overall node CPU usage drops from about 3 to 1 core. This is at least for us, besides the node restart, a workaround for this unwanted behavior.
Known issue: kubelet with CRI-O not cleaning up cgroups etc when pod gets deleted.
https://github.com/kubernetes/kubernetes/issues/106957 Fix: https://gist.github.com/aneagoe/6e18aaff48333ec059d0c1283b06813f
@vrutkovs is it possible to include this fix?
It is rather a workaround so probably not fit to include in release. My bad, wrong wording. It also should be solved in next release I hope. That is wat the remark '1.24.4 bump' is about, I guess. The issue is still open.
Just wanted to point out that this bug is around for many months. Not sure it has always been the same bug but it can be worked around with that nice work from Andrei.
We won't be including a workaround officially - its a kubelet problem and we're waiting for fixes to land, no need to digress from upstream too far.
I don't mind including this workaround as "official" mitigation recipe, if it has been verified to work
I can't say for sure it still works in 4.11 but it stabilizes our 'cron job hungry' 4.9 and 4.10 clusters. I will test it in 4.11 and let it know here.
It most definitely still cleans up a lot of left over cgroups. Tail of the daemon running on a 4.11 node that runs 10 short lived pods every minute:
2022-09-29T07:29:19+00:00 Removing CGROUP crio-f330b2f8f55951bb5f5445bcd6fa811bdca11447cdfd91b044647e3b846ba4ff.scope and its parent...
2022-09-29T07:29:19+00:00 Scope crio-f6d78fac4951a3296c7e6b118f9e67a0dcb56766005b5bef6776f4ea48652173.scope found under running pod, skipping...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-f86a48b76ea5a734428e11aa18e3a4f3fc9dbfc3894008dcde1d539ac5b50438.scope and its parent...
2022-09-29T07:29:19+00:00 Scope crio-f8f49053255d505d7d6bd2f018dc1985ebf3c34aee673abdd640404db16ef070.scope found under running pod, skipping...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-f8feb93e6aad0dc6b0d7c3dca881a3c5bdd601eff3a60a24a54e8c622abbfd57.scope and its parent...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-fab2dc0351b831ca6b1395e2721b60d49e10fe043c2d75cbbaa26c9c70ee2ef1.scope and its parent...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-fbc1d4ddc0707b6e8e06d4f87dbb6e8adf7c922e07275ad61d5934396a9d41fb.scope and its parent...
2022-09-29T07:29:19+00:00 Scope crio-fbcc50052c55de371d2927d4deb8f5bba8a7b9129e69e220cbadba4e33a93c10.scope found under running pod, skipping...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-fde14b0c95048cdd7c7fa91b9cef3bd3fe8e8120f2c7ae83c330a6a423a2971f.scope and its parent...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-ff49445e58458cb366245cbf1b85c72ba7e8a0868561505eb3355b7ce91f9154.scope and its parent...
2022-09-29T07:29:19+00:00 Removing CGROUP crio-ff6a76f198151084777c816edaea1ad6f2ec1d0acd10ca7e04fb451d89ec841c.scope and its parent...
2022-09-29T07:29:19+00:00 Sleeping for 600 seconds...
System memory spikes and keeps increasing after disabling the workaround, while running the test:
When re-enabling the workaround I see this pattern:
The workaround sleeps 10 minutes between runs, which explains the sawtooth pattern I guess.
Credits to @aneagoe
I put the workaround on a test cluster on tuesday, and the issue has not gone away there, and actually it seems to increase the log spam. i did not use the daemon set, instead i tossed it into a script and am running it on the nodes directly at the moment.
It is going through and appears to be doing what its supposed to, as an example:
Sep 29 07:13:36 MYSERVER.com check.sh[1119]: 2022-09-29T12:13:36+00:00 Starting k8s garbage collector run...
Sep 29 07:13:37 MYSERVER.com check.sh[1119]: 2022-09-29T12:13:37+00:00 Found POD MYSERVERcom-debug unknown to k8s control plane and without any PIDs, will delete it...
Sep 29 07:13:37 MYSERVER.com check.sh[3239730]: Stopped sandbox e62be3eacc7e2286012ff67f2aa969a6131fb4b87ac9d24f7eb98ae08cd346b3
Sep 29 07:13:37 MYSERVER.com check.sh[3239737]: Removed sandbox e62be3eacc7e2286012ff67f2aa969a6131fb4b87ac9d24f7eb98ae08cd346b3
But the memory leaks are still there - I lost two nodes (one control plane, one worker) last night to low memory on this cluster. the CPU usage is still high (3-4 cores normally on these nodes, which are all 4 core) on nodes that are building up.
And while i still get the
pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort p ........
logs, as well as the
kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/e9da40e0-99ac-40d6-802a- .........
logs, i now also get new errors from systemd about the cgroups that the workaround removed:
systemd[1]: kubepods-burstable-pod4f247343_7b97_45fe_899b_9b023d5316cf.slice: Failed to open /run/systemd/transient/kubepods-burstable-pod4f247343_7b97_45fe_899b_9b023d5316cf.slice: No such file or directory
This specific cluster i am testing the workaround on has barely any workloads left on it, no jobs (besides the couple built-in ones). but i have some automation that is launching numerous debug pods with regularity, which seems to cause the same symptoms as the cronjobs do (numerous temporary containers)
For me, restarting kubelet helps with the memory right away, but doesnt help the CPU, and gives unpredictable results (unable to launch pods, cannot get logs, cant connect to terminal of pods etc). so i have just been nicely draining and rebooting nodes automatically when the RAM usage gets too high. but before the oom reaper kicks in, as he wont touch kubelet and it just makes the situation worse as the cluster doesnt realize apps have been killed.
Edit: forgot to mention im on 4.11.0-0.okd-2022-08-20-022919
Exact same version here in my test cluster. To me this different behaviour does not make sense. I monitored the workaround for a long time some months ago to make sure it worked and didn't cause problems and it seemed to do so...
I will continue my test for longer time and watch for what you are describing and the cluster stability.
Exact same version here in my test cluster. To me this different behaviour does not make sense. I monitored the workaround for a long time some months ago to make sure it worked and didn't cause problems and it seemed to do so...
I will continue my test for longer time and watch for what you are describing and the cluster stability.
I'm dropping it on a second cluster right now, to see if I see different results there. one of those nodes is actually in it's slow downward spiral as i type this (spawned the SystemMemeoryExceedsReservation alert 30 minutes ago, which is a good forebearer of the memory leak taking over). but it has a lot of ram to burn through yet, so would take it awhile
I'm dropping it on a second cluster right now, to see if I see different results there. one of those nodes is actually in it's slow downward spiral as i type this (spawned the SystemMemeoryExceedsReservation alert 30 minutes ago, which is a good forebearer of the memory leak taking over). but it has a lot of ram to burn through yet, so would take it awhile
Indeed, I was also seeing that for months before I found some help (the workaround) in the openshift-users slack channel: https://kubernetes.slack.com/archives/C6AD6JM17/p1653030671315219
You can clearly see the problem and the garbage collecting every 10 minutes using a simple find
# pwd
/sys/fs/cgroup
# find . -type d | wc -l
463
The cgroups (and other things) don't get cleaned up by kubelet so the folder count keeps increasing until the garbage collector drops by... The workaround does clean that up, at least in my 4.11 test cluster.
I will let my test run for a few days because it seems like I have a little bit of memory leaking going on in 4.11, even with the workaround in place but not sure yet.
I'm still watching it on this second cluster i added it to as well. I had some results about 10 or 15 minutes after starting the garbage collector, but the SystemMemoryExceedsReservation alert never went away, and now two hours later the memory usage is starting to climb back up again. The g/c is still running, and still cleaning things up once in awhile.
Edit: here it is 24 hours later. it seems the garbage collector must help, but it still spikes for awhile before dropping suddenly. So while this is indeed prolonging the uptime, it doesnt seem to fix it for good, and after another spike or two it will still OOM I would suspect. It's on it's third upward climb as we speak. the SystemMemoryExceedsReservation alert has never gone away since it started climbing 24 hours ago either.
Looks like memory is filling up, even when running the workaroud :-(
Definitely show stopper for us to move to 4.11 if the node crashes eventually, which it probably will do. I will try to find out what is staying behind this time.
This crio memory leaking is going on since we started using OKD 4. Maybe OKD should swap crio with containerd, which upstream clearly supports better..
Edit: here it is 24 hours later. it seems the garbage collector must help, but it still spikes for awhile before dropping suddenly. So while this is indeed prolonging the uptime, it doesnt seem to fix it for good, and after another spike or two it will still OOM I would suspect. It's on it's third upward climb as we speak. the SystemMemoryExceedsReservation alert has never gone away since it started climbing 24 hours ago either.
On bigger clusters it is not abnormal the default 'SystemMemory' limit is too low. There is a RH issue explaining how to adjust the limit to your needs. We have also 1 production host that is always in 'SysMem' alert but that is not causing problems. If your temporary spikes are too high you could change the garbage collector sleep time if you didn't already.
I am still not sure about my test cluster. The RSS of the system slice keeps on going up very slowly but the total memory usage is not... I am still hoping it is memory that gets flushed eventually by crio or kubelet. Not sure where the RSS is building up yet.
Time will tell. I'd really would like to move to 4.11. Big fan of the dark mode in the console :)
My test kept on running over the weekend and I think we can conclude the memory usage stabilises. And thus the workaround is still doing its thing for us. So I believe this are the same bugs that are around since many versions.
For us no problem to move to 4.11.
The graph might be misleading: I also added workload in our testing cluster so gues some of the increase in 'system' slice memory can be attributed to that. Most important is that it seems to flatline at some point.
My test kept on running over the weekend and I think we can conclude the memory usage stabilises. And thus the workaround is still doing its thing for us. So I believe this are the same bugs that are around since many versions.
For us no problem to move to 4.11.
The graph might be misleading: I also added workload in our testing cluster so gues some of the increase in 'system' slice memory can be attributed to that. Most important is that it seems to flatline at some point.
Testing the patch on OKD 4.11.0-0.okd-2022-08-20-022919, nodes crash due to memory exhaust after three days. Hope k8s 1.24.6 will coming soon ? Thank-you very much for the work.
Hope k8s 1.24.6 will coming soon ?
It does not look like it's getting backported https://github.com/kubernetes/kubernetes/commits/v1.24.7-rc.0
It looks that k8s 1.24.6 has been merged to openshift/kubernetes:release-4.11: https://github.com/openshift/kubernetes/pull/1381 When can we expect new release of OKD?
We're waiting for it to be promoted in OCP nightlies and bumped up in machine-os-content. Hopefully that'd be this or the following weekend
It doesn't look like the rebase itself is fixing the leak. I've used a reproducer in https://github.com/okd-project/okd/issues/1310#issuecomment-1221965431 and I still see used memory climbing.
My prom-foo is not good enough to see if it climbs slower though, but from the looks of it seems https://github.com/kubernetes/kubernetes/issues/106957#issuecomment-1147441007 mitigates the RAM consumption quite well
Also I upgraded to 4.12 (w/ workaround) and applied https://github.com/kubernetes/kubernetes/pull/108855 (using https://github.com/vrutkovs/custom-okd-os/pkgs/container/custom-okd-os/45544090?tag=custom-kubelet) - that seems to have stopped the memory growth and the log spam
@vrutkovs @tyronewilsonfh can you please share the reproducer used from https://github.com/okd-project/okd/issues/1310#issuecomment-1221965431?
I've been trying to repro this as part of https://github.com/kubernetes/kubernetes/issues/112151 but haven't had too much luck yet. Are all these issues on CRI-O with crun or has it also been seen with runc/containerd?
I used https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#creating-a-cron-job to trigger memory leak
Will this help to create a 4.11 without this bug (backports of the fixed components)? Or what is the way forward?
Will this help to create a 4.11 without this bug (backports of the fixed components)? Or what is the way forward?
+1 for this. It would be great to understand the correct procedure to get our stable 4.10 cluster upgraded safely to 4.11.
We're going to need to backport https://github.com/kubernetes/kubernetes/pull/108855 all the way to 1.24 to have it fixed.
Meanwhile seems the workaround is to switch back to cgroups1 (add systemd.unified_cgroup_hierarchy=0
to kernel params) - could anyone confirm that?
I have what seems to be this issue (in the form of slowly increasing CPU usage requiring rebooting nodes) with kubelet and often also olm. As the cluster dates from cgroups1 days, and has been upgraded I am running cgroups1 (mount | grep group confirms. cgroups2 is present, but not being used.
I am running cgroups1 (mount | grep group confirms. cgroups2 is present, but not being used.
same here - and same problem
sorry, if the question is irrelevant, but does latest stabel 4.11.0-0.okd-2022-10-15 release fix the issue in any way @vrutkovs ? Over the weekend 2/3 masters in my 4.11.0-0.okd-2022-08-20-022919 production cluster were crushed due to memory leakage. Hopefully I checked the alerts Saturday between the machines crushes :).
I also checked that without any modifications system seems to use both, cgroups version 1 and 2. Should I switch to cgroups v1 as you proposed? For reasons I would like not to mention, I can't use my test cluster for that, so I would have to the direct changes in production, what I would prefer not to do without knowing that it would help...
does latest stabel 4.11.0-0.okd-2022-10-15 release fix the issue in any way
Most likely - not. It includes a memory leak in job controller, but seems more leaks are present:
/var/run
issue - https://github.com/kubernetes/kubernetes/issues/106957#issuecomment-1147441007sorry, if the question is irrelevant, but does latest stabel 4.11.0-0.okd-2022-10-15 release fix the issue in any way @vrutkovs ? Over the weekend 2/3 masters in my 4.11.0-0.okd-2022-08-20-022919 production cluster were crushed due to memory leakage. Hopefully I checked the alerts Saturday between the machines crushes :).
I also checked that without any modifications system seems to use both, cgroups version 1 and 2. Should I switch to cgroups v1 as you proposed? For reasons I would like not to mention, I can't use my test cluster for that, so I would have to the direct changes in production, what I would prefer not to do without knowing that it would help...
It does not seem to fix it. I upgraded last night and awoke to 2 worker nodes unresponsive.
@vrutkovs care to share your upgrade directions for OKD 4.12?
care to share your upgrade directions for OKD 4.12?
oc adm upgrade --force --allow-explicit-upgrade --allow-upgrade-with-warnings --to-image=<pick nightly pullspec from https://amd64.origin.releases.ci.openshift.org/#4.12.0-0.okd>
. Warning: its still being developed and nightlies are removed after 72hrs. Even with 4.12 you still need an OS with custom patched kubelet
[vrutkovs] See below for thread summary
Hi team, After upgrading to 4.11 we faced to new issue:
We use cronjobs with ' *'. After upgrading some pods from cronjobs stuck in "ContainerCreating" or "Init 0/1" status. In pod describe we can see: `Events: Type Reason Age From Message
Normal Scheduled 51s default-scheduler Successfully assigned 0xbet-prod/podname7673773-q6bq5 to ip-10-0-216-195.eu-central-1.compute.internal by ip-10-0-201-118 Warning FailedCreatePodContainer 9s kubelet unable to ensure pod container exists: failed to create container for [kubepods burstable pod3847723b-b7c8-4adc-a9d7-f3cdb83ae03f] : Timeout waiting for systemd to create kubepods-burstable-pod3847723b_b7c8_4adc_a9d7_f3cdb83ae03f.slice Normal AddedInterface multus Add eth0 [10.133.10.218/23] from ovn-kubernetes
Normal Pulled kubelet Container image "ghcr.io/banzaicloud/vault-env:1.13.0" already present on machine
Normal Created kubelet Created container copy-vault-env
Normal Started kubelet Started container copy-vault-env
.... `
The symptom is pod scheduling slower and slower and after some time it stuck on
Normal Scheduled 51s default-scheduler Successfully assigned 0xbet-prod/podname7673773-q6bq5 to ip-10-0-216-195.eu-central-1.compute.internal by ip-10-0-201-118
While login to node , I can see this in journalctl:
`Aug 13 21:38:49 ip-10-0-216-195 hyperkube[1546]: I0813 21:38:49.823654 1546 pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods burstable podbcc994a0-6720-48fc-889b
Reboot helps, but not for a long time. As result pods can't schedule and stuck on whole cluster.
Short thread summary.
What we know so far:
Probable cause:
Workaround: https://github.com/okd-project/okd/issues/1310#issuecomment-1312848841 - thanks to @framelnl there's a DaemonSet which can clean up the extra cgroups
Upstream issue refs: