Open terenceli opened 10 months ago
./runsc --rootless maps the current user to the root user inside the sandbox.
Adding to what @avagin said: so if you remove this part from your config.json:
"user": {
"uid": 1000,
"gid": 1000
},
OR set the user to 0
then it should work fine.
In your usecase, the sandbox is running in a user namespace with UID/GID mappings like 1000:0:1
(host uid/gid 1000 is mapped to container root user). So all files owned by 1000 on host will appear to be owned by 0
in the sandbox. Similarly if you create a file from inside the sandbox, it will appear to be owned by 1000
on the host, but inside the container owner will be 0
.
@avagin @ayushr2 So I see podman has a config --userns=keep-id. We may also need this config I think.
We want the application in Sentry has the same view in host from filesystem perspective.
runsc
and runc
are low level container runtimes. podman
is a tool on top of these runtimes. So if you want podman's --userns=keep-id
behavior, you can directly use podman with runsc to run rootless containers and enable that configuration. To see an example of how to configure podman with runsc, you can look at our Podman tests: test/podman/run.sh.
From podman docs, --userns=keep-id
means:
keep-id: creates a user namespace where the current rootless user’s UID:GID are mapped to the same values in the container. This option is not allowed for containers created by the root user.
You can not use --rootless
flag if you want --userns=keep-id
behavior. Because, when --rootless
is used, runsc re-executes itself as root with the caller UID mapped to 0
. So the caller becomes root. We can not map the caller UID to itself again.
If you do not want to use podman, you have the following optoins:
newuidmap
, newgidmap
tools and run runsc without the --rootless
flag and specify your desired user namespace and mappings in oci runtime config. Internally, we use the newuidmap, newgidmap binaries to configure the user namespace in this case. These binaries have the setuid bit set, which allows them to set UID/GID mappings beyond just the current user.unshare -U --uid-map --gid-map runsc ...
. So when runsc is called, all the necessary users are mapped. IIUC, this is what podman does.@ayushr2 Thanks for your explanation. I try to use the first method, as step as follows:
install newuidmap, newgidmap runsc OCI spec:
{
"ociVersion": "1.0.0",
"process": {
"user": {
"uid": 1000,
"gid": 1000
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
]
},
"root": {
"path": "/",
"readonly": true
},
"hostname": "runsc",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs"
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
}
],
"linux": {
"uidMappings": [
{
"containerID":1000,
"hostID": 1000,
"size": 100
}
],
"gidMappings": [
{
"containerID": 1000,
"hostID": 1000,
"size": 100
}
],
"namespaces": [
{
"type": "user"
},
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
]
}
}
I get error at this point: https://github.com/google/gvisor/blob/master/runsc/cmd/gofer.go#L775
The gofer is try to call setuid(0,0,0) it will failed.
Any ideas?
@ayushr2 I just do some tests.
Just as I said in the previous, the gofer process calls
'unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0)'
But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.
I then found a commit
https://github.com/google/gvisor/commit/595d424651b4baf70aeaf1fe2565b6c68969ae24
I realize this is what I want, 'runsc as a non-root user works'. But I can't make it work.
But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.
Yes, to set up the sandbox, the root user needs to be mapped inside the new user namespace.
In https://github.com/google/gvisor/commit/595d424651b4baf70aeaf1fe2565b6c68969ae24, I had tested it with:
--rootless
flag: it would remap my non-root user to root inside the sandbox.In both cases, the container's root user is mapped, so it works.
If you want --userns=keep-id
behavior, then maybe you need to use at least 2 host users. Map host 1000 to container 1000 and map the other user to root. The gofer needs to be root to setup the filesystem and all that (which require capabilities).
@avagin Am I missing something?
Let me look into this. It may be an issue with the function you pointed out (syncUsernsForRootless()
). The gofer and sandbox processes are started in a new userns with complete set of capabilities. So to support --userns=keep-id
, we may need to setuid/setgid to the user that has been mapped.
It should not be necessary to map 2 host users for this. Because Podman etc support this use case.
Yeah this seems like an issue in runsc. For rootless containers, we currently only support running the non-root user as root inside the container's user namespace. This covers the common usecase. But as a result, Podman features like --userns=keep-id
are not supported yet.
I tried looking into this a bit. I don't have the bandwidth to pursue this right now. But in case someone else is looking at this, here are some pointers:
CAP_SYS_CHROOT
to do the final chroot after re-execing.@ayushr2 your second pointer just remind of that whether the non-root process's child which in a new userns will have all full capability. After reading the doc and do some experiment I found the answer is 'yes'.
After test, we also need 'Inheritable' caps. I just try following patch:
diff --git a/runsc/cmd/gofer.go b/runsc/cmd/gofer.go
index 3228587..624e971 100644
--- a/runsc/cmd/gofer.go
+++ b/runsc/cmd/gofer.go
@@ -56,6 +56,8 @@ var goferCaps = &specs.LinuxCapabilities{
Bounding: caps,
Effective: caps,
Permitted: caps,
+ Inheritable: caps,
+ Ambient: caps,
}
// goferSyncFDs contains file descriptors that are used for synchronization
@@ -782,10 +784,11 @@ func syncUsernsForRootless(fd int) {
// SETUID changes UID on the current system thread, so we have
// to re-execute current binary.
runtime.LockOSThread()
- if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0); errno != 0 {
+ var uid = os.Getuid()
+ if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, uintptr(uid), 0, 0); errno != 0 {
util.Fatalf("failed to set UID: %v", errno)
}
- if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, 0, 0, 0); errno != 0 {
+ if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, uintptr(uid), 0, 0); errno != 0 {
util.Fatalf("failed to set GID: %v", errno)
}
}
And it just works!
test@test-VirtualBox:~/runsc$ ./runsc --ignore-cgroups --network none run abc
id
uid=1000(test) gid=1000(test) groups=1000(test)
touch /tmp/abc
exit
test@test-VirtualBox:~/runsc$ ls -lh /tmp/abc
-rw-r--r-- 1 test test 0 2月 23 16:45 /tmp/abc
So could you review this and make a more formal(such as tests) patch. Thanks!
Awesome! I think I had missed the Inheritable: caps
part. Uh I unfortunately don't have cycles right now, but have put this on my TODO list. If someone else wants to pick this, feel free to!
I think the missing piece is, we need to communicate with syncUsernsForRootless()
the host UID of the runsc caller and then lookup that UID in the UID mappings from container spec. If found, we want to setuid to the mapped container UID.
A friendly reminder that this issue had no activity for 120 days.
This issue has been closed due to lack of activity.
Re-open? Seems like this would be good to have fixed at some point. Thanks.
Description
While in rootless mode, the container UID is the same as the UID of running runsc, in the container we can't access the file belongs to the host UID.
I want to know how to achieve this. So that in rootless mode, the unprivileged user can access the host file with the same UID.
Steps to reproduce
using following OCI spec
`test@test-VirtualBox:~/test$ id
uid=1000(test) gid=1000(test) groups=1000(test),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)
test@test-VirtualBox:~/test$ ./runsc -rootless -root /home/test -ignore-cgroups --network host run abc
touch /home/test/bb
touch: cannot touch '/home/test/bb': Permission denied
`
`{
}
`
runsc version
No response
docker version (if using docker)
No response
uname
No response
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
No response