Open ashraffouda opened 6 months ago
If the k8s flist uses the obsolete raw image, the first attached "disk" to the vm MUST be a zmount, not a volume. Extra "volumes" can be added to the VM. Then you can only mount them if the k8s has the virtiofs module.
the right way to do this actually is now modify the k8s image to use the preferred flist style with individual files.
k3s image is not a VM, as it doesn't have a kernel with it. However, it turned out that overlayfs has some issues with virtiofs as the upper layer. and since container runtimes usually use overlayfs, basically all of them won't work with the new Volumes.
There are kernel batches for running virtiofs with overlayfs but I believe it will make it harder for users to create custom images with these patches. So, we might need to revise the way Volumes work.
So the incompatibility between virtiofs and overlayfs has been understood for a while in the context of running Docker inside micro VMs and trying to use the virtiofs based rootfs for Docker's data dir. Docker tends to automatically fall back to the vfs
driver and continue operating, but performance is very bad. Placing Docker's data dir on a disk (raw image type) fixes this (and conforms to the intended design of storing user data on a disk/volume). If we intend to deprecate that form in favor of the new virtiofs based volume, then we won't have this workaround.
As suggested in the error message in the original post, using the fuse-overlayfs
driver can be another alternative. That probably has better performance than vfs
but is still going to be a performance hit over using a non fuse driver. Maybe this could be acceptable for many use cases where performance sensitive data can be stored in a volume attached to the container (since container volumes don't use the same storage driver as used for the container rootfs).
I reviewed the discussions around improving compatibility for virtiofs and overlayfs. For reference, this issue contains the best overview of the situation.
There are kernel patches for running virtiofs with overlayfs
It seems that these patches were merged into the mainline kernel as of 5.7. Seems what we're missing are the other pieces of the puzzle mentioned in this comment on the issue linked above:
# we absolutely need xattr and sys_admin cap
# allow_direct_io just seems sensible but is not required
# we had been using -o writeback which improved performance however users were reporting problems so removed it
virtio_fs_extra_args = ["-o", "xattr", "-o", "modcaps=+sys_admin", "-o", "allow_direct_io"]
Also:
One thing we've figured out (again with help from RHers above) is that to create an overlayfs in virtiofs your bottom layer must not also be overlay -- (e.g. it needs to be ext4, xfs, etc).
Based on my read of https://github.com/threefoldtech/zos/issues/1564, that suggests that our current implementation of rootfs is ruled out, since it's virtiofs backed by overlayfs, but we should be able to get this working for volumes, assuming they are just btrfs
underneath.
One question then is whether it's acceptable to give CAP_SYS_ADMIN
to virtiofsd
.
Describe the bug
Deployment of k8s cluster is broken when zvolumes are used while it is working properly when zmounts are used It gives this error
To Reproduce
Deploy k8s cluster with zvolumes