opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.93k stars 2.12k forks source link

Some namespace path joining errors together with user ns #4138

Open lifubang opened 11 months ago

lifubang commented 11 months ago

For example, we have started a container named test, and the init process' id is 14821.

  1. [ipc] joining container test's ipc namespace together with user ns:
                "uidMappings": [{
                        "containerID": 0,
                        "hostID": 1000,
                        "size": 2000
                }],
                "gidMappings": [{
                        "containerID": 0,
                        "hostID": 1000,
                        "size": 2000
                }],
                "namespaces": [
                        {
                                "type": "ipc",
                                "path": "/proc/14821/ns/ipc"
                        },
                        {
                                "type": "user"
                        }
                ],

    The result will be:

    @lifubang ➜ ~/nstest $ sudo /workspaces/runc/runc run -d ipc
    ERRO[0000] runc run failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": mount src=mqueue, dst=/dev/mqueue, dstFD=/proc/self/fd/8, flags=0xe: operation not permitted
  2. I think joining the existing pid, net, and uts namespace path will have the same issue like ipc.

I don't know whether we should support this feature or not?

cyphar commented 11 months ago

That particular example is a configuration error -- if you join an existing namespace (such as IPC) and then make a new user namespace you won't have any capabilities over the IPC namespace and so stuff like that will fail and will require changing the configuration.

lifubang commented 11 months ago

But if I join container test's user ns path instead of create an new user ns, it will work.

                "namespaces": [
                        {
                                "type": "ipc",
                                "path": "/proc/14821/ns/ipc"
                        },
                        {
                                "type": "user",
                                "path": "/proc/14821/ns/user"
                        }
                ],

The container test's user mapping has the same value as this one:

                "uidMappings": [{
                        "containerID": 0,
                        "hostID": 1000,
                        "size": 2000
                }],
                "gidMappings": [{
                        "containerID": 0,
                        "hostID": 1000,
                        "size": 2000
                }],
cyphar commented 11 months ago

@lifubang That's just how user namespaces work. Every other namespace instance is owned by a user namespace, and capability permissions are based on the owning user namespace (not the kuid -- there are checks related to the kuid but basic capability checks are not). You can re-create this behaviour using unshare and nsenter:

% unshare -Uri sleep infinity
% nsenter -t $pid -i -- unshare -Urm -- mount -t mqueue mqueue /tmp # fails
% unshare -Urm -- nsenter -t $pid -i # fails due to permission issues