Open matthewparkinsondes opened 2 years ago
I have the same/simlar issue with Rancher 2.6.6, K8S v1.21.12, Ubuntu 20.04 (latest).
Same issue exists with Rancher 2.6.6, K8S v1.22.10, Ubuntu 20.04 (latest).
Reproduction steps:
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/sysbox-install.yaml
@dgadelha, thanks for the repro steps. Will take a look at this one within the next few days.
@rodnymolina same issue on: RKE2 v1.23.10+rke2r1, sysbox-ce v0.5.2, Ubuntu 22.04, kernel 5.15.0-46-generic
When the RKE2 worker node is initially bootstrapped (with containerd), kubelet on the node is running with the following arg:
--cgroup-driver=systemd
After successful sysbox installation kubelet is restarted and running with the --cgroup-driver=cgroupfs
.
However, sysbox generated crio configuration is missing the cgroup_manager
setting which defaults to systemd
(https://github.com/nestybox/cri-o/blob/main/docs/crio.conf.5.md)
Adding the following in/etc/crio/crio.conf
:
[crio]
...
[crio.runtime]
conmon_cgroup = "pod"
cgroup_manager = "cgroupfs"
...
and restarting crio with systemctl restart crio.service
resolves the problem.
@gdraganic, I think that's pretty much the issue here...
The sysbox daemon-set should not have switched the RKE2's kubelet to cgroupfs
if RKE2 has originally picked systemd
as its cgroup driver. The goal here is to reduce disruptions to the minimum, so we will need to check why this isn't happening.
In consequence, the current crio.conf
file is displaying the proper configuration. The problem is with the kubelet
one. We will look into this.
Btw, yes, you can also workaround the issue by doing what you suggested (switching crio to cgroupfs
), but I think a better solution would be to do what I described above.
Thanks for your detailed analysis.
What exactly is sysbox doing here? Is it installing cri-o and then reconfiguring rke2 to use that as the --container-runtime-endpoint instead of the packaged containerd? RKE2 manages the cgroup driver config for both the kubelet and containerd if you use the packaged containerd, but if you bring your own container runtime, you're responsible for making sure that the kubelet's cgroup driver is the same as the container runtime. Sounds like sysbox is not doing that properly.
but if you bring your own container runtime, you're responsible for making sure that the kubelet's cgroup driver is the same as the container runtime.
That's correct @brandond; it should be doing that with Kubelet and CRI-O, but I'll let @rodnymolina double check it.
@rodnymolina any update ? Same issue on: Rancher 2.6.8, v1.23.8-rancher1, sysbox-ce v0.5.2, Ubuntu 20.04, kernel 5.4.0-1055-kvm
@brandond, @TH3VenGeuR, sorry for the delay. We have a fix for this one already. I'll provide a test image to verify things are working as expected. Will ping you when done (a few days).
RKE2 manages the cgroup driver config for both the kubelet and containerd if you use the packaged containerd, but if you bring your own container runtime, you're responsible for making sure that the kubelet's cgroup driver is the same as the container runtime. Sounds like sysbox is not doing that properly.
@brandond. that was exactly the issue: sysbox was not properly setting the cgroup-driver based on kubelet's preconfigured value, which explains why cri-o and kubelet are displaying conflicting settings.
@rodnymolina Any update on the fix. I also have an RKE2 cluster that this issue is reproducible on.
@matthewparkinsondes, @brandond, @MWY3510, @TH3VenGeuR, could you please give this image a try? This one is expected to fix this issue and a few others we've been working on.
To point to it, you will need to update our install/uninstall manifests to ghcr.io/nestybox/sysbox-deploy-k8s:rodny-dev
.
@rodnymolina I tested your image against a fresh RKE2 cluster running 1.23. Looks like it is working. I will test against my other two clusters and double check. Should have those results later today. Thanks.
@rodnymolina same issue on: RKE2 v1.23.10+rke2r1, sysbox-ce v0.5.2, Ubuntu 22.04, kernel 5.15.0-46-generic
When the RKE2 worker node is initially bootstrapped (with containerd), kubelet on the node is running with the following arg:
--cgroup-driver=systemd
After successful sysbox installation kubelet is restarted and running with the
--cgroup-driver=cgroupfs
. However, sysbox generated crio configuration is missing thecgroup_manager
setting which defaults tosystemd
(https://github.com/nestybox/cri-o/blob/main/docs/crio.conf.5.md)Adding the following in
/etc/crio/crio.conf
:[crio] ... [crio.runtime] conmon_cgroup = "pod" cgroup_manager = "cgroupfs" ...
and restarting crio with
systemctl restart crio.service
resolves the problem.
I am currently using the same workaround above and I am able to run sysbox on Ubuntu 20.04 Rancher RKE1 K8S 1.23.14 cluster. However, whenever a worker node is rebooted, kubelet und kube-proxy restarted by dockerd try to bind the same ports 10250 and 10248 on the restarted node. Of course, one of them fails and continues to restart forever.
I found the following workaround:
docker stop <kube-proxy-id>
docker rm <kube-proxy-id>
Although I have no clue what is happening there, I hope others might benfit from this workaround. Bad news is, that although this workaround helped on two nodes, the third node I rebooted is still stuck with a kube-proxy that cannot bind 102048/10250, though these ports are bound by kubelet, of course.
UPDATE: A docker system prune
directly after docker rm
for the kube-proxy container seems to help. Now it even works again for the 3rd node.
UPDATE2: Menawhile I have installed a couple of more nodes and when the above workaround in crio.conf is applied before the deployment of sysbox times out, then everything is stable and even survives a reboot of the server node without problems.
Hi @rodnymolina, image is not working on my environment. It ends with error : Failed to create pod sandbox: rpc error: code = Unknown desc = cri-o configured with systemd cgroup manager, but did not receive slice as parent
@FFock, sorry for the delay in getting back to you. I will try to restart a k3s node to reproduce the behavior you described.
@TH3VenGeuR, could you please provide more details about your setup so that I can attempt to reproduce it? Thanks.
@TH3VenGeuR, also, which sysbox-deploy image did you try in this last attempt?
Hello, i tried this image ghcr.io/nestybox/sysbox-deploy-k8s:rodny-dev. You said it should fix the issue one month ago My setup is still the same : Rancher 2.6.8, v1.23.8-rancher1, sysbox-ce v0.5.2, Ubuntu 20.04, kernel 5.4.0-1055-kvm
Hi @rodnymolina,
I've now tested the new sysbox-deploy image against the version combinations detailed below.
Each test succeeded, in that all pods came online of their own accord. A series of errors are seen in the logs reproduced below.
Here is the deploy image used.
ghcr.io/nestybox/sysbox-deploy-k8s:rodny-dev
Here are the version combinations.
Here is the manifest file used during installation.
https://github.com/nestybox/sysbox/raw/master/sysbox-k8s-manifests/sysbox-ee-install.yaml
Here are the logs from the sysbox-ee-deploy-k8s container for each combination.
Combination 1
Fri, Mar 3 2023 2:25:12 pm | E0303 04:25:12.703565 27 memcache.go:238] couldn't get current server API group list: Get "https://10.43.0.1:443/api?timeout=32s": dial tcp 10.43.0.1:443: i/o timeout
Fri, Mar 3 2023 2:25:12 pm | Detected Kubernetes version v1.23
Fri, Mar 3 2023 2:25:13 pm | Stopping the Kubelet config agent on the host ...
Fri, Mar 3 2023 2:25:13 pm | Removing Kubelet config agent from the host ...
Fri, Mar 3 2023 2:25:13 pm | Kubelet reconfig completed.
Fri, Mar 3 2023 2:25:13 pm | Adding K8s label "crio-runtime=running" to node ...
Fri, Mar 3 2023 2:25:14 pm | node/athena207-cluster1-w1 labeled
Fri, Mar 3 2023 2:25:14 pm | Adding K8s label "sysbox-runtime=running" to node ...
Fri, Mar 3 2023 2:25:14 pm | node/athena207-cluster1-w1 labeled
Fri, Mar 3 2023 2:25:14 pm | The k8s runtime on this node is now CRI-O.
Fri, Mar 3 2023 2:25:14 pm | Sysbox-EE installation completed.
Fri, Mar 3 2023 2:25:14 pm | Done.
Combination 2
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.318312 27 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.326628 27 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.336036 27 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.337765 27 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.427676 34 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.430724 34 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.434975 34 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | E0302 23:16:46.437726 34 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:46 am | Detected Kubernetes version v1.24
Fri, Mar 3 2023 9:16:46 am | Stopping the Kubelet config agent on the host ...
Fri, Mar 3 2023 9:16:46 am | Removing Kubelet config agent from the host ...
Fri, Mar 3 2023 9:16:47 am | Kubelet reconfig completed.
Fri, Mar 3 2023 9:16:47 am | Adding K8s label "crio-runtime=running" to node ...
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.342302 62 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.347992 62 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.352203 62 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.354770 62 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | node/athena207-cluster1-w1 labeled
Fri, Mar 3 2023 9:16:47 am | Adding K8s label "sysbox-runtime=running" to node ...
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.624243 66 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.642437 66 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.645826 66 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | E0302 23:16:47.647929 66 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fri, Mar 3 2023 9:16:47 am | node/athena207-cluster1-w1 labeled
Fri, Mar 3 2023 9:16:47 am | The k8s runtime on this node is now CRI-O.
Fri, Mar 3 2023 9:16:47 am | Sysbox-EE installation completed.
Fri, Mar 3 2023 9:16:47 am | Done.
Combination 3
Fri, Mar 3 2023 10:41:03 am | Detected Kubernetes version v1.24
Fri, Mar 3 2023 10:41:03 am | Stopping the Kubelet config agent on the host ...
Fri, Mar 3 2023 10:41:04 am | Removing Kubelet config agent from the host ...
Fri, Mar 3 2023 10:41:04 am | Kubelet reconfig completed.
Fri, Mar 3 2023 10:41:04 am | Adding K8s label "crio-runtime=running" to node ...
Fri, Mar 3 2023 10:41:05 am | node/athena207-cluster2-w1 labeled
Fri, Mar 3 2023 10:41:05 am | Adding K8s label "sysbox-runtime=running" to node ...
Fri, Mar 3 2023 10:41:05 am | node/athena207-cluster2-w1 labeled
Fri, Mar 3 2023 10:41:05 am | The k8s runtime on this node is now CRI-O.
Fri, Mar 3 2023 10:41:05 am | Sysbox-EE installation completed.
Fri, Mar 3 2023 10:41:05 am | Done.
Ran into this issue on AKS, on a node that uses ARM where the cgroup_manager should be "systemd", but was "cgroupfs". Was able to fix it by patching crio.conf and then restarting the kubelet and crio.
is that fixed in 0.6.1? I don't seee it mentioned in the changelog for 0.6.1
I have tried 0.6.1 with Rancher RKE1 on Rancher 2.7.3 with Ubuntu 22.04 (and 20.04) and K8S 1.24.13 as well as 1.25.9 but I get always the same error when starting my Pod requiring sysbox:
Failed to create pod sandbox: rpc error: code = Unknown desc = RuntimeHandler "sysbox-runc" not supported
I have not found any error in the crio installation log, kubelet logs, etc.
What is new (compared to Sysbox 0.5.2 with K8S 1.23.x on Ubuntu 20.04) is the fact that now all K8S pods are still running on docker and not on crio after the sysbox installation. That is something I did not expect. With 0.5.2 only kublet, kubelet-proxy, and nginx-proxy where running on docker. Everything else was running on crio. Is this the source of the problem or expected?
@rodnymolina, I think I found the root cause of this problem. It is the kubelet-config-helper.service which is not accepting the crio runtime sysbox-deploy installed:
+ crio_conf_file=/etc/crio/crio.conf
+ crio_socket=/var/run/crio/crio.sock
+ crio_runtime=unix:///var/run/crio/crio.sock
+ kubelet_bin=
+ runtime=
+ kubelet_ctr_restart_mode=no
+ execstart_line_global=
+ main
++ id -u
+ euid=0
+ [[ 0 -ne 0 ]]
+ mount
+ grep -q '/sys .*ro,'
+ kubelet_snap_deployment
+ snap list
+ grep -q kubelet
+ kubelet_rke_deployment
+ command -v docker
+ docker inspect '--format={{.Config.Labels}}' kubelet
+ egrep -q rke.container.name:kubelet
+ do_config_kubelet_rke
+ echo 'Detected RKE'\''s docker-based kubelet deployment on host.'
Detected RKE's docker-based kubelet deployment on host.
+ get_runtime_kubelet_docker
+ set +e
++ docker exec kubelet bash -c 'ps -e -o command | egrep \^kubelet | egrep -o "container-runtime-endpoint=\S*" | cut -d '\''='\'' -f2'
+ runtime=unix:///var/run/cri-dockerd.sock
+ set -e
+ [[ unix:///var/run/cri-dockerd.sock == '' ]]
+ [[ unix:///var/run/cri-dockerd.sock =~ crio ]]
+ [[ ! unix:///var/run/cri-dockerd.sock =~ dockershim ]]
+ echo 'Unsupported runtime for RKE scenario: unix:///var/run/cri-dockerd.sock'
Unsupported runtime for RKE scenario: unix:///var/run/cri-dockerd.sock
This seems to be fixed in 0.6.1. On AKS using sysbox 0.6.1 I get a valid crio.conf file with cfgroup_manager = "systemd"
(as it should be).
Thanks for verifying this @the-gigi.
There are a couple of different issues being reported here: 1) the systemd/cgroupfs issue that affects mostly rke2 setups and which has been already fixed in v0.6.1, and 2) the one reported by @FFock that impacts traditional rke deployments. The second one hasn't been fixed yet, but I'm planning to do so shortly.
@rodnymolina, am seeing issue 1 continue to show up periodically on rke2 nodes with Sysbox 0.6.3.
I've included observations below of an issue 1 failure with Sysbox 0.6.3 on rke2 v1.27, along with details of a successful execution for comparison.
Reviewing the sysbox-deploy-k8s logs shows that Sysbox was added to crio config at 12:54:47.
logs from sysbox-deploy-k8s
Sysbox comes online at 12:54:47, which is just prior to the failure and is also the same time as crio config is last modified.
systemctl status sysbox
Crio config file last modification time is 12:54:47.
ls -l --time-style full-iso /etc/crio/crio.conf
A reference to cgroupfs is missing from crio config.
/etc/crio/crio.conf
Crio logs fail to show crio being stopped and restarted. These are the last lines in the crio logs.
journalctl -xefu crio
A search for the kubelet process comes up empty.
ps -eo pid,lstart,cmd | grep kubelet
Reviewing the kubelet logs shows the sysbox-deploy-k8s pod. The last kubelet logs are at 12:54:49. This corresponds with the kubelet being restarted around this time to pickup new crio config.
/var/lib/rancher/rke2/agent/logs/kubelet.log
rke2-agent shows a failure at 12:54:50.
systemctl status rke2-agent
Further system logs show the kubelet-config-helper.service process is killed at 12:54:51.
/var/log/syslog
This is confirmed in the logs for kubelet-config-helper.service.
journalctl -xefu kubelet-config-helper.service
Further system logs show the kubelet has been identified as systemd managed, as opposed to cgroupfs.
/var/log/syslog
And finally, the kubelet binary fails to be identified.
/var/log/syslog
Details of a successful launch are shown below.
Reviewing the sysbox-deploy-k8s logs shows that Sysbox was added to crio config at 12:56:02.
logs from sysbox-deploy-k8s
Sysbox is active and running at 12:56:02.
systemctl status sysbox
Reviewing the kubelet logs shows the sysbox-deploy-k8s pod. The last kubelet logs for the original execution are at 12:56:04. This corresponds with the kubelet being restarted around this time to pickup new crio config.
/var/lib/rancher/rke2/agent/logs/kubelet.log
System logs show the cgroup_driver being set to cgroupfs at 12:56:06 and crio being restarted,
/var/log/syslog
Crio config file last modification time is 12:56:06 and the file includes a reference to cgroupfs.
ls -l --time-style full-iso /etc/crio/crio.conf
/etc/crio/crio.conf
Crio logs show crio is stopped and restarted at 12:56:06.
journalctl -xefu crio
System logs show crio starting successfully at 12:56:06.
/var/log/syslog
The rke2-agent service is started at 12:56:06.
/var/log/syslog
Kubelet starts at 12:56:07, and its parameters include cgroup-driver being set to cgroupfs.
ps -eo pid,lstart,cmd | grep kubelet
The kubelet logs also confirm that it successfully restarts at 12:56:07 with the CgroupDriver set to cgroupfs.
/var/lib/rancher/rke2/agent/logs/kubelet.log
rke2-agent achieves active status at 12:56:07 and remains running.
systemctl status rke2-agent
And a second sysbox-deploy-k8s log file shows that Sysbox completes its configuration at 12:56:40.
second sysbox-deploy-k8s log file
The following pieces look to be critical.
journalctl -xefu rke2-agent.service
journalctl -xefu kubelet-config-helper.service
Both of these are systemd related services, and run within a new VM created via automation.
Am assuming the issue with kubelet-config-helper.service relates to lingering of systemd user instances within the VM.
I have therefore applied the following fix and will monitor to see if the issue re-occurs.
loginctl enable-linger root
The lingering fix is in place, and the issue has re-occurred.
We noticed the logs showing systemd perform a reload at the exact point the kubelet-config-helper.service is killed.
journalctl -xe
This takes place while the kubelet-config-helper.service is setting up the crio config file.
This in turn causes the observed failure of the Sysbox installer to setup a cgroup driver of cgroupfs.
@rodnymolina, any thoughts on the above?
we started looking at the issue from the perspective that a service is being reloaded at the failure point.
we therefore removed the needrestart package ... and also noticed that snapd continued to reload lxd at the failure point.
we are now testing the removal of both the needrestart and snapd packages just prior to creating the RKE2 node ... and reinstalling these two packages once the RKE2 node is successfully started with Sysbox installed.
Rancher 2.6.5 - RKE2 v1.23.6+rke2r1 - Sysbox-CE 0.5.2 - Ubuntu 20.04 - Kernel 5.13.0-39-generic - x86_64
An issue was observed when attempting to install Sysbox on an RKE2 kubernetes cluster. After the install, the pods in each worker node fail to come back online, and are stuck in the ContainerCreating or Podinitializing state.
The pods show the following error.
Below are logs from the sysbox-deploy-k8s-xxxxx pod.