vpsfreecz / vpsadminos

Host for Linux system containers based on NixOS, ZFS and LXC
https://vpsadminos.org
MIT License
155 stars 26 forks source link

File capabilities lost after ct chown #61

Open aither64 opened 1 year ago

aither64 commented 1 year ago

File capabilities set from within a user namespace apparently include user id and are then valid only if the user id is in the current user namespace, see https://elixir.bootlin.com/linux/v6.1.42/source/security/commoncap.c#L455.

This is not an issue for file capabilities set from inside a container, but it is a problem for capabilities stored in container images. Container images are built in a different user namespace than containers created from those images, which makes the file capabilities invalid. Unfortunately, file capabilities can be used e.g. instead of the suid bit, for example on Fedora/CentOS:

[nix-shell:~]# getcap -r /dozer/ct/instance-077d2aad/private/
/dozer/ct/instance-077d2aad/private/usr/bin/newuidmap cap_setuid=ep
/dozer/ct/instance-077d2aad/private/usr/bin/clockdiff cap_net_raw=p
/dozer/ct/instance-077d2aad/private/usr/bin/newgidmap cap_setgid=ep
/dozer/ct/instance-077d2aad/private/usr/bin/arping cap_net_raw=p

Reading those capabilities from the container fails:

[CT instance-077d2aad] root@instance-077d2aad:/# strace getcap /usr/bin/newuidmap
[...]
getxattr("/usr/bin/newuidmap", "security.capability", NULL, 0) = -1 EOVERFLOW (Value too large for defined data type)
[...]

The same issue will arise when a container is chowned into a different user namespace. All existing file capabilities will not longer be valid.

It's unclear how we could solve this. We could create a list of files with capabilities when images are built and then restore those capabilities when containers are created from the images. ct chown would still break the capabilities though. Walking through all files on existing containers to find all capabilities and preserve them is highly impractical as there can be millions of files.

aither64 commented 1 year ago

It seems that when file capabilities are set from init_user_ns, there's no associated uid with them and they work from within a user namespace. Resetting all file capabilities from init_user_ns after images are built could be a way.

aither64 commented 1 year ago

Container images are fixed with 4ef8d4d3a2f34411c296291758a931ddad631da5 -- images will contain unnamespaced file capabilities. File capabilities will still be lost on ct chown. There is some integration with id mapping mounts, so this can be investigated further in the future if we would switch from our ZFS uidmaps/gidmaps.