nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

How to use kubernetes mountPropagation Bidirectional feature in sysbox-runc #661

Closed jiyisky closed 1 year ago

jiyisky commented 1 year ago

I have a workload that requires using Kubernetes' mountPropagation: Bidirectional feature, but the container runtime sysbox does not support privileged: true. Do you have any suggestions for how to set the Bidirectional configuration?

ctalledo commented 1 year ago

Hi @jiyisky, thanks for taking a look at Sysbox.

Yes, unfortunately the mountPropagation: Bidirectional is not supported for Sysbox containers, since that would allow the (unrpivileged) Sysbox container to affect host mounts, thereby breaking isolation (i.e., a mount inside the sysbox container on a shared volume would result in the host or other containers sharing that volume to see the mount too).

The limitation actually comes from the Linux kernel, as described in mount_namespaces(8):

       [1] Each mount namespace has an owner user namespace.  As
           explained above, when a new mount namespace is created, its
           mount list is initialized as a copy of the mount list of
           another mount namespace.  If the new namespace and the
           namespace from which the mount list was copied are owned by
           different user namespaces, then the new mount namespace is
           considered less privileged.

       [2] When creating a less privileged mount namespace, shared
           mounts are reduced to slave mounts.  This ensures that
           mappings performed in less privileged mount namespaces will
           not propagate to more privileged mount namespaces.

Sysbox containers always use the user-namespace, so the mount-namespace associated with the Sysbox container is less privileged (per [1] above) and that in turn means "shared mounts" (aka bidirectional) are disallowed per [2].

Hope that explains it.