Closed pwurbs closed 3 years ago
Thanks for filing this issue @pwurbs!
Sysbox-PODs feature has not been validated / tested on Rancher yet. Will take a look at this one tomorrow.
I was able to reproduce the issue by deploying a cluster directly through rke -- had too many issues trying to import pre-existing nodes into rancher. Even though the setup may not be exactly the same as the one originally described, there shouldn't be any relevant differences for us as rancher internally relies on rke
too.
There are various issues at play here:
Sysbox-pods installer assumes that kubelet service is deployed as a systemd service, which is not the case in rke setups, as there, kubelet executes within a privileged container (sharing pid namespaces with the host and bind-mounting a bunch of host resources). We could expand our installer to cover this deployment pattern, but then we would need to deal with the second issue below ...
rke relies on docker to build (and monitor) all the components of the k8s control-plane, so even if we find a way to install cri-o
, the rke monitoring routines would fail at detecting them. The same applies to any other high-level container runtime -- see here for similar concerns with containerd
's runtime.
Now, rke seems to be on its way out, and AFAIK, rancher is about to replace it with rke2 as its K8s engine. The good news here is that rke2
supports containerd off-the-bat, and it should be able to talk to any CRI-complaint runtime (see here), so looks like this is the winning horse we must focus on.
@pwurbs, how does rke2
sounds for you? Is rke-to-rke2 migration already part of your roadmap?
@rodnymolina Thx for the analysis. So I understand that Sysbox can't be deployed currently on a Rancher managed K8S cluster (RKE based). Right? Unfortunately we currently don't intend to move to RKE2. Would it be a workaround to install Sysbox using the host installation procedure instead of deploying it using the K8S manifests?
Would it be a workaround to install Sysbox using the host installation procedure instead of deploying it using the K8S manifests?
Installing Sysbox through the traditional package won't help here as Rancher (and its provisioning tools: rke, rke2, ks3) won't be aware of its existence in the remote hosts. For that integration process to happen is that we have the 'sysbox-k8s-deploy' daemon-set.
Having said that, there may be an alternative approach that we are currently investigating to make this all work. Please stay tuned.
At the end we were able to make it work (see details below). RKE can now deploy sysbox-powered pods in a cluster. Changes have been pushed to the latest Sysbox-deploy-k8s installer, which will deploy both CRI-O and Sysbox in the desired k8s-nodes.
In terms of implementation, we went for the following approach:
As mentioned above, RKE heavily relies on docker to create both the k8s control-plane as well as its data-plane. The former components are spawned as docker containers (i.e. kubelet
, kube-proxy
and nginx-proxy
), whereas the latter ones (e.g. cni pods and all user workloads) are created as PODs through the docker-shim
interface.
As we don't want / we can't change RKE, we are still relying on docker to create the basic control-plane components. However, we have switched all the data-plane components from docker-shim to CRI-O.
As it's usually the case, we have incorporated all the required configuration steps as part of the sysbox-deploy-k8s daemonset. All that is required is the execution of the following steps -- k8s-nodes' re-configuration process shouldn't take more than a minute:
kubectl label nodes <node-name> sysbox-install=yes
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/rbac/sysbox-deploy-rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/daemonset/sysbox-deploy-k8s.yaml
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/runtime-class/sysbox-runtimeclass.yaml
Refer to our k8s installation guide for more details.
I could now successfully deploy Sysbox at a Rancher managed (RKE) cluster node using the K8S manifest files. I used Ubuntu 20.04-latest, Docker 20.x and Kubernetes v1.20.10 The testing pod according to https://github.com/nestybox/sysbox/blob/master/docs/user-guide/install-k8s.md#pod-deployment could be successfully deployed (without any privileged mode). Within that container I could successfully pull and start a nginx container. So far everything is fine, thank you.
Then I started successfully a pod with docker:dind image (docker:19.03.15-dind-alpine3.13)
Trying "docker pull nginx" in this container results in this error:
failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo655593762") failed: failed to rmdir /rdwoo655593762/m/d: remove /rdwoo655593762/m/d: operation not permitted
This is the Docker version info from within the container:
Server: Docker Engine - Community
Engine:
Version: 19.03.15
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 99e3ed8
Built: Sat Jan 30 03:18:13 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.9
GitCommit: ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
These versions are a bit different from your ubuntu-bionic-systemd-docker image. I am not sure, if this issue is K8S / RKE related. I only wanted to let you know...
Hi @pwurbs,
Glad you were able to install Sysbox on your RKE nodes (great work by @rodnymolina to enable this).
Regarding the latest problem you reported:
failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo655593762") failed: failed to rmdir /rdwoo655593762/m/d: remove /rdwoo655593762/m/d: operation not permitted
This looks very similar to issue #254, where the problem showed up when the inner Docker uses slightly older versions.
However, in that issue we reported that the problem occurs when the inner Docker has version < 19.03, but in your case the inner Docker has version 19.03.
Could you retry with a docker dind image using Docker 20+ please?
I am not sure, if this issue is K8S / RKE related. I only wanted to let you know...
I don't believe so. Thus, it makes sense for us to move this discussion to issue #254. I'll copy your prior comment and this current one to that issue, so we can continue the discussion there. I'll close this one.
I tried to install Sysbox in a k8S cluster using the user guide.
So Sysbox requirements should be fulfilled.
RBAC and RuntimeClass have been successfully deployed. But there are issues with the Daemonset sysbox-deploy-k8s, the Pod is continously crashing. This is the log line before crashing:
Job for kubelet-config-helper.service failed because the control process exited with error code. See "systemctl status kubelet-config-helper.service" and "journalctl -xe" for details.
This is the result of "systemctl status kubelet-config-helper.service":
The cluster has been created in Rancher using the option "Create a new Kubernetes cluster", based on existing nodes. So the single node has been prepared and imported to create the new (downstream) cluster. Attached, there is the cluster-config, exported from Rancher cluster-config.txt