google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.89k stars 1.3k forks source link

gVisor unprivileged user can't access file in rootless mode #9918

Open terenceli opened 10 months ago

terenceli commented 10 months ago

Description

While in rootless mode, the container UID is the same as the UID of running runsc, in the container we can't access the file belongs to the host UID.

I want to know how to achieve this. So that in rootless mode, the unprivileged user can access the host file with the same UID.

Steps to reproduce

using following OCI spec

`test@test-VirtualBox:~/test$ id

uid=1000(test) gid=1000(test) groups=1000(test),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)

test@test-VirtualBox:~/test$ ./runsc -rootless -root /home/test -ignore-cgroups --network host run abc

touch /home/test/bb

touch: cannot touch '/home/test/bb': Permission denied

`

`{

"ociVersion": "1.0.0",

"process": {

    "user": {

        "uid": 1000,

        "gid": 1000

    },

    "args": [

        "sh"

    ],

    "env": [

        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",

        "TERM=xterm"

    ],

    "cwd": "/",

    "capabilities": {

        "bounding": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "effective": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "inheritable": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "permitted": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ]

    },

    "rlimits": [

        {

            "type": "RLIMIT_NOFILE",

            "hard": 1024,

            "soft": 1024

        }

    ]

},

"root": {

    "path": "/",

    "readonly": true

},

"hostname": "runsc",

"mounts": [

    {

        "destination": "/proc",

        "type": "proc",

        "source": "proc"

    },

    {

        "destination": "/dev",

        "type": "tmpfs",

        "source": "tmpfs"

    },

    {

        "destination": "/sys",

        "type": "sysfs",

        "source": "sysfs",

        "options": [

            "nosuid",

            "noexec",

            "nodev",

            "ro"

        ]

    },

    {

            "destination":"/home/test",

            "type":"bind",

            "source":"/home/test",

            "options":[

                    "rbind",

                    "rw"

            ]

    }

],

"linux": {

    "namespaces": [

        {

            "type": "pid"

        },

        {

            "type": "network"

        },

        {

            "type": "ipc"

        },

        {

            "type": "uts"

        },

        {

            "type": "mount"

        }

    ]

}

}

`

runsc version

No response

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

avagin commented 10 months ago

./runsc --rootless maps the current user to the root user inside the sandbox.

ayushr2 commented 10 months ago

Adding to what @avagin said: so if you remove this part from your config.json:

    "user": {
        "uid": 1000,
        "gid": 1000
    },

OR set the user to 0 then it should work fine.

In your usecase, the sandbox is running in a user namespace with UID/GID mappings like 1000:0:1 (host uid/gid 1000 is mapped to container root user). So all files owned by 1000 on host will appear to be owned by 0 in the sandbox. Similarly if you create a file from inside the sandbox, it will appear to be owned by 1000 on the host, but inside the container owner will be 0.

terenceli commented 10 months ago

@avagin @ayushr2 So I see podman has a config --userns=keep-id. We may also need this config I think.

We want the application in Sentry has the same view in host from filesystem perspective.

ayushr2 commented 10 months ago

runsc and runc are low level container runtimes. podman is a tool on top of these runtimes. So if you want podman's --userns=keep-id behavior, you can directly use podman with runsc to run rootless containers and enable that configuration. To see an example of how to configure podman with runsc, you can look at our Podman tests: test/podman/run.sh.

From podman docs, --userns=keep-id means:

keep-id: creates a user namespace where the current rootless user’s UID:GID are mapped to the same values in the container. This option is not allowed for containers created by the root user.

You can not use --rootless flag if you want --userns=keep-id behavior. Because, when --rootless is used, runsc re-executes itself as root with the caller UID mapped to 0. So the caller becomes root. We can not map the caller UID to itself again.

If you do not want to use podman, you have the following optoins:

terenceli commented 10 months ago

@ayushr2 Thanks for your explanation. I try to use the first method, as step as follows:

install newuidmap, newgidmap runsc OCI spec:

{
    "ociVersion": "1.0.0",
    "process": {
        "user": {
            "uid": 1000,
            "gid": 1000
        },
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "inheritable": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ]
    },
    "root": {
        "path": "/",
        "readonly": true
    },
    "hostname": "runsc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs"
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        }
    ],
    "linux": {
        "uidMappings": [
            {
                "containerID":1000,
                "hostID": 1000,
                "size": 100
            }
        ],
        "gidMappings": [
            {
                "containerID": 1000,
                "hostID": 1000,
                "size": 100
            }
        ],
        "namespaces": [
        {
        "type": "user"
        },
            {
                "type": "pid"
            },
            {
                "type": "network"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            }
        ]
    }
}

I get error at this point: https://github.com/google/gvisor/blob/master/runsc/cmd/gofer.go#L775

The gofer is try to call setuid(0,0,0) it will failed.

Any ideas?

terenceli commented 9 months ago

@ayushr2 I just do some tests.

Just as I said in the previous, the gofer process calls

'unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0)'

But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.

I then found a commit

https://github.com/google/gvisor/commit/595d424651b4baf70aeaf1fe2565b6c68969ae24

I realize this is what I want, 'runsc as a non-root user works'. But I can't make it work.

ayushr2 commented 9 months ago

But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.

Yes, to set up the sandbox, the root user needs to be mapped inside the new user namespace.

In https://github.com/google/gvisor/commit/595d424651b4baf70aeaf1fe2565b6c68969ae24, I had tested it with:

In both cases, the container's root user is mapped, so it works.

If you want --userns=keep-id behavior, then maybe you need to use at least 2 host users. Map host 1000 to container 1000 and map the other user to root. The gofer needs to be root to setup the filesystem and all that (which require capabilities).

@avagin Am I missing something?

ayushr2 commented 9 months ago

Let me look into this. It may be an issue with the function you pointed out (syncUsernsForRootless()). The gofer and sandbox processes are started in a new userns with complete set of capabilities. So to support --userns=keep-id, we may need to setuid/setgid to the user that has been mapped.

It should not be necessary to map 2 host users for this. Because Podman etc support this use case.

ayushr2 commented 9 months ago

Yeah this seems like an issue in runsc. For rootless containers, we currently only support running the non-root user as root inside the container's user namespace. This covers the common usecase. But as a result, Podman features like --userns=keep-id are not supported yet.

I tried looking into this a bit. I don't have the bandwidth to pursue this right now. But in case someone else is looking at this, here are some pointers:

terenceli commented 9 months ago

@ayushr2 your second pointer just remind of that whether the non-root process's child which in a new userns will have all full capability. After reading the doc and do some experiment I found the answer is 'yes'.

After test, we also need 'Inheritable' caps. I just try following patch:


diff --git a/runsc/cmd/gofer.go b/runsc/cmd/gofer.go
index 3228587..624e971 100644
--- a/runsc/cmd/gofer.go
+++ b/runsc/cmd/gofer.go
@@ -56,6 +56,8 @@ var goferCaps = &specs.LinuxCapabilities{
        Bounding:  caps,
        Effective: caps,
        Permitted: caps,
+       Inheritable: caps,
+       Ambient: caps,
 }

 // goferSyncFDs contains file descriptors that are used for synchronization
@@ -782,10 +784,11 @@ func syncUsernsForRootless(fd int) {
        // SETUID changes UID on the current system thread, so we have
        // to re-execute current binary.
        runtime.LockOSThread()
-       if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0); errno != 0 {
+       var uid = os.Getuid()
+       if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, uintptr(uid), 0, 0); errno != 0 {
                util.Fatalf("failed to set UID: %v", errno)
        }
-       if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, 0, 0, 0); errno != 0 {
+       if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, uintptr(uid), 0, 0); errno != 0 {
                util.Fatalf("failed to set GID: %v", errno)
        }
 }

And it just works!


test@test-VirtualBox:~/runsc$ ./runsc --ignore-cgroups --network none  run abc
id
uid=1000(test) gid=1000(test) groups=1000(test)
touch /tmp/abc
exit
test@test-VirtualBox:~/runsc$ ls -lh /tmp/abc
-rw-r--r-- 1 test test 0 2月  23 16:45 /tmp/abc

So could you review this and make a more formal(such as tests) patch. Thanks!

ayushr2 commented 9 months ago

Awesome! I think I had missed the Inheritable: caps part. Uh I unfortunately don't have cycles right now, but have put this on my TODO list. If someone else wants to pick this, feel free to!

I think the missing piece is, we need to communicate with syncUsernsForRootless() the host UID of the runsc caller and then lookup that UID in the UID mappings from container spec. If found, we want to setuid to the mapped container UID.

github-actions[bot] commented 5 months ago

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] commented 2 months ago

This issue has been closed due to lack of activity.

rsyring commented 1 month ago

Re-open? Seems like this would be good to have fixed at some point. Thanks.