nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.82k stars 159 forks source link

Bug:sysbox-deploy-k8s:v0.6.1: Getting errors after kernel upgrade. #720

Open groundsada opened 1 year ago

groundsada commented 1 year ago

My worker node is deploying correctly on an Ubuntu 20.04 with a 5.4.0 kernel. My setup requires a kernel with acs_override patch. Whenever I use any different kernel other than the original mainline 5.4.0, I get a BackOff container restarting error. This is the log from sysbox-deploy-k8s:

E0728 21:43:13.399000 26 memcache.go:255] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1 E0728 21:43:13.621834 88 memcache.go:255] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1 Detected Kubernetes version v1.23 Adding K8s taint "sysbox-runtime=not-running:NoSchedule" to node ... E0728 21:43:13.785878 131 memcache.go:255] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1 node/node-X-X-XX-XX-XX.example.com modified Adding K8s label "sysbox-runtime=installing" to node ... E0728 21:43:14.016878 159 memcache.go:255] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1 node/node-X-X-XX-XX-XX.example.com not labeled Installing Sysbox dependencies on host ... Copying shiftfs sources to host ... Kernel version 5.4 is >= 5.4 and < 5.8 Deploying Sysbox installer helper on the host ... Running Sysbox installer helper on the host (may take several seconds) ... Stopping the Sysbox installer helper on the host ... Removing Sysbox installer helper from the host ... Installing Sysbox on host ... Configuring host sysctls ... sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory fs.inotify.max_queued_events = 1048576 fs.inotify.max_user_watches = 1048576 fs.inotify.max_user_instances = 1048576 kernel.keys.maxkeys = 20000 kernel.keys.maxbytes = 1400000 kernel.pid_max = 4194304

groundsada commented 1 year ago

For reference, my kernel is: 5.4.0-xanmod0, but the issue is consistent across any kernel version

ctalledo commented 1 year ago

Hi @groundsada, thanks for giving Sysbox a shot.

I am not sure where the memcache errors are coming from, but I think the main error is the last one in the log:

sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory

I believe that error is coming from this sysbox-deploy-k8s sysctl config file, and it's clear that your K8s node host does not have that sysctl. But that's expected because that file is not present in Ubuntu hosts (it's instead replaced by this other sysctl: /proc/sys/user/max_user_namespaces).

What I don't fully understand is why the script does not adjust for this (e.g., if ubuntu, look for max_user_namespace, else look for unprivileged_userns_clone).

Also: why are you on sysbox-deploy-k8s v0.6.1 and not v0.6.2?

rodnymolina commented 1 year ago

@groundsada, as indicated by @ctalledo above, the sysbox-k8s installer has some requirements that must be met in regards to the linux distros (and associated kernels) being supported. My understanding is that xanmod kernel is unrelated to the Ubuntu one, and as such, it may expose a different sysctl interface (notice that sysbox-k8s installer expects the presence of /proc/sys/kernel/unprivileged_userns_clone node).

groundsada commented 1 year ago

@ctalledo @rodnymolina Thank you both for replying. Even in v0.6.2 and even after applying the acs override patch on the Ubuntu mainline 5.4 kernel, I get the same error of sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory.

groundsada commented 1 year ago

@rodnymolina does the sysbox-k8s installer expects the presence of /proc/sys/kernel/unprivileged_userns_clone in ubuntu? I don't think it exists in ubuntu (non-debian) kernels. That's the problem I am facing. The installer is looking for /proc/sys/kernel/unprivileged_userns_clone instead of max_user_namespace. Is that fixable on my end?

rodnymolina commented 1 year ago

@groundsada, being Ubuntu and Debian part of the same distro family, it's natural to expect them to have many similarities, also kernel-related ones. For example, both distros expose the /proc/sys/kernel/unprivileged_userns_clone node, which Sysbox uses to detect the presence/activation of unprivileged-user-namespaces in the running kernel of these two distros.

Now, as you can see here, Sysbox is supported beyond Debian/Ubuntu, so the presence of this unprivileged_userns_clone node isn't a must-have requirement for Sysbox to properly operate.

However, none of our installers (neither the traditional nor the k8s-specific one) support Sysbox deployments beyond a relatively-small distro set (see here), so if you want to make use of this custom kernel you would need to adjust our sysbox-k8s installer to bypass this procfs requirement, and hope that other Sysbox kernel requirements (i.e., id-mapped mounts) are properly satisfied by your kernel (you would need 5.19+).

Finally, you would need to build Sysbox from sources and manually add its binaries to your own sysbox-k8s-deploy image, which will be the one you would utilize to install Sysbox in your clusters. This is all fairly doable but it would require you to familiarize with the Sysbox installation process and related components.

groundsada commented 1 year ago

@rodnymolina @ctalledo I apologize for my confusion. It turns out that the issue was that I patched a mainline kernel on an Ubuntu system and sysbox wasn't able to find /proc/sys/kernel/unprivileged_userns_clone. I fixed it by patching an Ubuntu kernel instead.

ctalledo commented 1 year ago

@rodnymolina @ctalledo I apologize for my confusion. It turns out that the issue was that I patched a mainline kernel on an Ubuntu system and sysbox wasn't able to find /proc/sys/kernel/unprivileged_userns_clone. I fixed it by patching an Ubuntu kernel instead.

Ah ... that makes sense, thanks for letting us know.

Closing this issue then!