nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.82k stars 156 forks source link

NFS ID Mapping in Kubernetes #831

Open johnsmyth opened 2 months ago

johnsmyth commented 2 months ago

I am mounting an NFS volume in my Kubernetes container. If I use the default runtimeClass, it works as expected - The file ownership is mapped to the users in the container, ie:

drwxr-xr-x 2 admin admin 4096 Aug 28 12:50 ./
drwxr-xr-x 3 root  root  4096 Aug 28 12:49 ../
-rw-r--r-- 1 admin admin    0 Aug 27 16:55 test2
-rw-r--r-- 1 admin admin   15 Aug 27 16:57 test3
-rw-r--r-- 1 admin admin    0 Aug 27 16:20 testfile

If I change only the runtimeClass to sysbox-runc, the file ownership is not mapped - Everything is owned by nobody:nogroup:

-rw-r--r-- 1 nobody nogroup    0 Aug 27 16:20 testfile
-rw-r--r-- 1 nobody nogroup   15 Aug 27 16:57 test3
-rw-r--r-- 1 nobody nogroup    0 Aug 27 16:55 test2
drwxr-xr-x 3 root   root    4096 Aug 27 16:59 ..
drwxr-xr-x 2 nobody nogroup 4096 Aug 27 16:57 .

I'm running in GKE, and using the Ubuntu with containerd (ubuntu_containerd) node type as was suggested in the docs, the kernel is version 5.15.0-1061-gke and shiftfs appears to be installed. The documentation suggests that with this kernel version and shiftfs the ID mapping should work. Any ideas?

net00-1 commented 2 months ago

I also encountered this problem. I am using the same setup in GKE, and the log says this when the Sysbox pod starts up

setting up ID-mapped mount on path /var/lib/containers/storage/overlay/my/mount/path failed with Failed to set mount attr: invalid argument (likely means idmapped mounts are not supported on the filesystem at this path (nfs))

These are NFSv3 netapp volumes that I am connecting to the pod through PVC. They work on regular container setup. I found that only some types of volumes have been confirmed to work. Is this a limitation of GKE node linux kernel/volume type/shiftfs or something else?

EddieX64 commented 1 month ago

Hello, I'm having the same issue using GKE with Filestore NFSv3 instance as a PVC for our pods. Seeing the same error mentioned by @net00-1

setting up ID-mapped mount on path /var/lib/containers/storage/overlay/my/mount/path failed with Failed to set mount attr: invalid argument (likely means idmapped mounts are not supported on the filesystem at this path (nfs))

Because of this we are unable to use shared NFS volumes for our CI/CD pipelines. This limitation is unfortunately causing a slowdown in the overall execution of the pipelines.

ctalledo commented 1 month ago

Hi folks, thanks for giving Sysbox a try, and apologies for the belated response.

The problem is that neither shiftfs nor idmapped mounts, the two mechanisms used by Sysbox to map filesytem user-ID and group-IDs inside the rootless container, work with NFS unfortunately. This is why inside the Sysbox container the NFS share files show up with nobody:nogroup.

The easiest solution would be for NFS to support idmapped mounts, but I am not aware of any plans at the moment to do this.

Alternatively, we could add some trickery in Sysbox to create dedicated NFS client mounts for each container and use NFS ID-mapping options to ensure the files show up with proper user and group-IDs inside the container, but this requires more investigation and work, and we currently don't have sufficient cycles for it.

I don't know of any other solution unfortunately. Maybe using bindfs could work (?), though not sure how it may affect performance.

bushev commented 1 month ago

Hey guys!

I attempted to use bindfs to resolve the issue:

sudo bindfs -o force-user=root,force-group=root,perms=0755 /mnt/nfs_share/docker /mnt/bindfs/docker

However, I encountered the following error when running Docker:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: error in the container spec: invalid mount config: failed to request mount source preps from sysbox-mgr: failed to invoke PrepMounts via grpc: rpc error: code = Unknown desc = failed to shift uids via chown for mount source at /mnt/bindfs/docker: failed to shift ACL for /mnt/bindfs/docker: failed to get ACL for /mnt/bindfs/docker: operation not supported: unknown.

It appears that sysbox tries to shift UIDs via chown on the bindfs mount point, but this operation isn't supported, likely due to the underlying filesystem limitations. This results in the container failing to start.

Unfortunately, there's a lack of information on using bindfs with Sysbox, especially concerning NFS shares. It seems that bindfs might not be compatible with Sysbox's UID/GID shifting mechanisms out of the box.

I'm exploring alternative solutions or workarounds for my problem. If anyone has experience or insights on integrating bindfs with Sysbox to address UID/GID mapping issues in rootless containers using NFS, your advice would be greatly appreciated.

bushev commented 1 month ago

BTW, Is there a way to share a directory between hosts so that it can be mapped to a Sysbox container? I've tried methods like SSHFS and others, but none seem to work. Syncing files with rsync is an option but it's quite inconvenient. Are there any solutions or workarounds to this problem?

ctalledo commented 1 month ago

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: error in the container spec: invalid mount config: failed to request mount source preps from sysbox-mgr: failed to invoke PrepMounts via grpc: rpc error: code = Unknown desc = failed to shift uids via chown for mount source at /mnt/bindfs/docker: failed to shift ACL for /mnt/bindfs/docker: failed to get ACL for /mnt/bindfs/docker: operation not supported: unknown.

That sounds like a small bug in Sysbox: it expects the underlying filesystem to support ACLs, but apparently not all filesystems do. Ideally Sysbox should first check if it does, and if not, simply skip checking the ACLs and continue.

As a quick and dirty work-around, you could try commenting the if statement here and then rebuilding sysbox with make sysbox && sudo make install.

It's not a proper fix, but at least will get you past that issue and see if the bindfs / NFS mount works.

jonathanbeber commented 17 hours ago

@bushev did you test it? I wonder if a fix in sysbox to check whether ACLs are available or not could work.

jonathanbeber commented 17 hours ago

I tried using the bindfs from inside the container, but that would require the /dev/fuse device what I believe is blocked by #850

bushev commented 14 hours ago

Hey! No, unfortunately, I couldn’t get it to work with BindFS. I ran into issues with symbolic links and ACL. I haven’t revisited the problem since then.