openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.95k stars 3.46k forks source link

Rootless docker on OpenWrt 21.02.3 #18988

Open gheist opened 2 years ago

gheist commented 2 years ago

Maintainer: @G-M0N3Y-2503 Environment: aarch64 Cortex-A53 21.03.2

Description: Trying to run rootless docker on the latest stable release. The feature is intended to allow non-privileged users to run Docker containers. When trying to run rootless docker the following error is observed:

toor@OpenWrt:~$ docker run -it --rm arm64v8/busybox docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "proc" to rootfs at "/proc": mount proc:/proc (via /proc/self/fd/7), flags: 0xe: operation not permitted: unknown.

Dockerd is run as follows: rootlesskit '--net=lxc-user-nic' '--mtu=1500' '--slirp4netns-sandbox=auto' '--slirp4netns-seccomp=auto' --disable-host-loopback '--port-driver=builtin' '--copy-up=/etc' '--copy-up=/run' '--propagation=rslave' '--lxc-user-nic-binary=/usr/lib/lxc/lxc-user-nic' --lxc-user-nic-bridge br-lan --debug --cgroupns /home/toor/bin/dockerd-rootless.sh --debug '--iptables=false'

Strace of dockerd shows that dockerd is trying to mount the container image opened via procfs (/proc/self/fd/7) and fails, despite running as "fake root"

7885 mount("/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119", "/home/toor/.local/share/docker 7885 <... mount resumed>) = 0
7885 newfstatat(AT_FDCWD, "/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119/proc", <unfinishe 7885 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
7885 newfstatat(AT_FDCWD, "/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119/proc", <unfinishe 7885 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
7885 newfstatat(AT_FDCWD, "/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119/proc", <unfinishe 7885 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
7885 newfstatat(AT_FDCWD, "/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119/proc", <unfinishe 7885 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
7885 futex(0xcda550, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
7885 <... futex resumed>) = 0
7885 openat(AT_FDCWD, "/home/toor/.local/share/docker/vfs/dir/28e1713fe7e5bb5cbdee5268a23bb19a41c2a7a699e408a147987e62a309e119/proc", O_RDONLY|O_CLOE 7885 <... openat resumed>) = 7
7885 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1746532184, u64=547207378776}} <unfinished ...>
7885 <... epoll_ctl resumed>) = -1 EBADF (Bad file descriptor)
7885 readlinkat(AT_FDCWD, "/proc/self/fd/7", <unfinished ...>
7885 <... readlinkat resumed>"/home/toor/.local/share/docker/v"..., 128) = 108
7885 mount("proc", "/proc/self/fd/7", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL <unfinished ...>
7885 <... mount resumed>) = -1 EPERM (Operation not permitted)

The issue is likely related to/identical to https://github.com/openwrt/packages/issues/15096

Similar issue has been observed when running docker within lxc, but it appears to be related to apparmor configuration by lxc, and got resolved by the following lxc commit https://github.com/lxc/lxd/commit/546e2a60809a108a1f505b99c6edbda52b12c739 In our case apparmor (or SELinux) are disabled in the kernel, it's kernel refusing mount even though the user has root uid

Environment

Raspberry Pi 3 with latest stable OpenWrt release Kernel 5.4.188 running cgroups v2

root@OpenWrt:~# uname -a Linux OpenWrt 5.4.188 #0 SMP Sat Apr 16 12:59:34 2022 aarch64 GNU/Linux root@OpenWrt:~# cat /etc/openwrt_release DISTRIB_ID='OpenWrt' DISTRIB_RELEASE='21.02.3' DISTRIB_REVISION='r16554-1d4dea6d4f' DISTRIB_TARGET='bcm27xx/bcm2710' DISTRIB_ARCH='aarch64_cortex-a53' DISTRIB_DESCRIPTION='OpenWrt 21.02.3 r16554-1d4dea6d4f' DISTRIB_TAINTS=''

Rootless docker is installed as follows: curl -fsSL https://get.docker.com/rootless | SKIP_IPTABLES=1 sh

Docker uses rootlesskit https://github.com/rootless-containers/rootlesskit for unprivileged execution using user/network/mount namespace capabilities We are using the setuid lxc-user-nic option for network namespacing and cgroup namespace (--cgroupns flag) for cgroups isolation

Rootless dockerd output toor@OpenWrt:~$ dockerd-rootless.sh --debug --iptables=false

DEBU[2022-07-20T08:59:30.982000763Z] Calling HEAD /_ping
DEBU[2022-07-20T08:59:30.998994683Z] Calling POST /v1.41/containers/create
DEBU[2022-07-20T08:59:31.001166598Z] form data: {"AttachStderr":true,"AttachStdin":true,"AttachStdout":true,"Cmd":null,"Domainname":"","Entrypoint":null,"Env":null,"HostConfig":{"AutoRemove":true,"Binds":null,"BlkioDeviceReadBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceWriteIOps":null,"BlkioWeight":0,"BlkioWeightDevice":[],"CapAdd":null,"CapDrop":null,"Cgroup":"","CgroupParent":"","CgroupnsMode":"","ConsoleSize":[0,0],"ContainerIDFile":"","CpuCount":0,"CpuPercent":0,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpuShares":0,"CpusetCpus":"","CpusetMems":"","DeviceCgroupRules":null,"DeviceRequests":null,"Devices":[],"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IOMaximumBandwidth":0,"IOMaximumIOps":0,"IpcMode":"","Isolation":"","KernelMemory":0,"KernelMemoryTCP":0,"Links":null,"LogConfig":{"Config":{},"Type":""},"MaskedPaths":null,"Memory":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":-1,"NanoCpus":0,"NetworkMode":"default","OomKillDisable":false,"OomScoreAdj":0,"PidMode":"","PidsLimit":0,"PortBindings":{},"Privileged":false,"PublishAllPorts":false,"ReadonlyPaths":null,"ReadonlyRootfs":false,"RestartPolicy":{"MaximumRetryCount":0,"Name":"no"},"SecurityOpt":null,"ShmSize":0,"UTSMode":"","Ulimits":null,"UsernsMode":"","VolumeDriver":"","VolumesFrom":null},"Hostname":"","Image":"arm64v8/busybox","Labels":{},"NetworkingConfig":{"EndpointsConfig":{}},"OnBuild":null,"OpenStdin":true,"Platform":null,"StdinOnce":true,"Tty":true,"User":"","Volumes":{},"WorkingDir":""} DEBU[2022-07-20T08:59:31.258955953Z] container mounted via layerStore: &{/home/toor/.local/share/docker/vfs/dir/996ec748b54cd518b14625554a3db436bac71432f414823971ff3288e18edc5b 0x3412e80 0x3412e80} container=90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 DEBU[2022-07-20T08:59:31.275843519Z] Calling POST /v1.41/containers/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74/attach?stderr=1&stdin=1&stdout=1&stream=1 DEBU[2022-07-20T08:59:31.276935493Z] attach: stdin: begin
DEBU[2022-07-20T08:59:31.277281272Z] attach: stdout: begin
DEBU[2022-07-20T08:59:31.277515802Z] attach: stderr: begin
DEBU[2022-07-20T08:59:31.279243084Z] Calling POST /v1.41/containers/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74/wait?condition=removed DEBU[2022-07-20T08:59:31.282873586Z] Calling POST /v1.41/containers/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74/start DEBU[2022-07-20T08:59:31.284651024Z] container mounted via layerStore: &{/home/toor/.local/share/docker/vfs/dir/996ec748b54cd518b14625554a3db436bac71432f414823971ff3288e18edc5b 0x3412e80 0x3412e80} container=90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 DEBU[2022-07-20T08:59:31.287078668Z] Assigning addresses for endpoint serene_sanderson's interface on network bridge DEBU[2022-07-20T08:59:31.287257365Z] RequestAddress(LocalDefault/172.17.0.0/16, , map[]) DEBU[2022-07-20T08:59:31.287416218Z] Request address PoolID:172.17.0.0/16 App: ipam/default/data, ID: LocalDefault/172.17.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65533, Sequence: (0xc0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:0 Serial:false PrefAddress:
DEBU[2022-07-20T08:59:31.338142041Z] Assigning addresses for endpoint serene_sanderson's interface on network bridge DEBU[2022-07-20T08:59:31.363306334Z] Programming external connectivity on endpoint serene_sanderson (41a7ecc3abfb2c501c912310683df1c8c2508ea7825039a41c0c7446b4bb0d4d) DEBU[2022-07-20T08:59:31.369708435Z] EnableService 90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 START DEBU[2022-07-20T08:59:31.370307182Z] EnableService 90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 DONE DEBU[2022-07-20T08:59:31.385126426Z] bundle dir created bundle=/home/toor/.docker/run/docker/containerd/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 module=libcontainerd namespace=moby root=/home/toor/.local/share/docker/vfs/dir/996ec748b54cd518b14625554a3db436bac71432f414823971ff3288e18edc5b DEBU[2022-07-20T08:59:31.421983782Z] event published ns=moby topic=/containers/create type=containerd.events.ContainerCreate time="2022-07-20T08:59:31.509465711Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1 time="2022-07-20T08:59:31.509873000Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1 time="2022-07-20T08:59:31.510076541Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1 time="2022-07-20T08:59:31.511866427Z" level=info msg="starting signal loop" namespace=moby path=/home/toor/.docker/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 pid=7524 runtime=io.containerd.runc.v2 DEBU[2022-07-20T08:59:31.642375940Z] failed to delete task error="rpc error: code = NotFound desc = container not created: not found" id=90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 INFO[2022-07-20T08:59:31.643772286Z] shim disconnected id=90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 WARN[2022-07-20T08:59:31.644223326Z] cleaning up after shim disconnected id=90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 namespace=moby INFO[2022-07-20T08:59:31.644330252Z] cleaning up dead shim
WARN[2022-07-20T08:59:31.675619512Z] cleanup warnings time="2022-07-20T08:59:31Z" level=info msg="starting signal loop" namespace=moby pid=7550 runtime=io.containerd.runc.v2 time="2022-07-20T08:59:31Z" level=warning msg="failed to read init pid file" error="open /home/toor/.docker/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74/init.pid: no such file or directory" runtime=io.containerd.runc.v2 ERRO[2022-07-20T08:59:31.678005385Z] copy shim log error="read /proc/self/fd/13: file already closed" ERRO[2022-07-20T08:59:31.680066989Z] stream copy error: reading from a closed fifo DEBU[2022-07-20T08:59:31.681207035Z] attach: stdout: end
DEBU[2022-07-20T08:59:31.681238024Z] attach: stderr: end
DEBU[2022-07-20T08:59:31.681380523Z] attach: stdin: end
DEBU[2022-07-20T08:59:31.681685053Z] attach done
DEBU[2022-07-20T08:59:31.706578670Z] event published ns=moby topic=/containers/delete type=containerd.events.ContainerDelete DEBU[2022-07-20T08:59:31.720434273Z] Revoking external connectivity on endpoint serene_sanderson (41a7ecc3abfb2c501c912310683df1c8c2508ea7825039a41c0c7446b4bb0d4d) DEBU[2022-07-20T08:59:31.811405247Z] Releasing addresses for endpoint serene_sanderson's interface on network bridge DEBU[2022-07-20T08:59:31.811680922Z] ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.2) DEBU[2022-07-20T08:59:31.811912900Z] Released address PoolID:LocalDefault/172.17.0.0/16, Address:172.17.0.2 Sequence:App: ipam/default/data, ID: LocalDefault/172.17.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65532, Sequence: (0xe0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:3 DEBU[2022-07-20T08:59:31.817425319Z] garbage collected d=5.062837ms ERRO[2022-07-20T08:59:31.829803430Z] 90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74 cleanup: failed to delete container from containerd: no such container ERRO[2022-07-20T08:59:31.876182922Z] Handler for POST /v1.41/containers /90527e27a9f1fa917228278335b461ad6da1a6ad1caa6d03f80b3b2b5d662c74/start returned error: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "proc" to rootfs at "/proc": mount proc:/proc (via /proc/self/fd/7), flags: 0xe: operation not permitted: unknown DEBU[2022-07-20T08:59:31.878696607Z] Closing buffered stdin pipe

Format code blocks by wrapping them with pairs of ```
G-M0N3Y-2503 commented 2 years ago

Looking at the last time I looked into rootlesskit It looks like I seemed to think that it depended on systemd. I don't recall why I thought that, but if that is still the case, since OpenWrt uses procd to replace systemd it looks like it might be an uphill battle to support it.

Also, not sure if it is relevant to this issue, but it looks https://get.docker.com/rootless downloads a binary, so maybe the binary needs to be compiled with the OpenWrt Buildsystem too.

gheist commented 2 years ago

I didn't see any direct or claimed dependency On systemd-based systems (checked on Ubuntu 18.04 LTS), systemd invokes rootlesskit and runs it as a service, otherwise rootlesskit can be run manually. I was able to get dockerd started, mount the container and set up container networking, however runc execution of the container image fails because the kernel in OpenWrt rejecting procfs mount in the runc init. It happens with both vfs and fuse-overlayfs storage drivers.

7885 mount("proc", "/proc/self/fd/7", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL <unfinished ...> 7885 <... mount resumed>) = -1 EPERM (Operation not permitted)

Rootlesskit tries and is denied sysfs mount on start, which maybe the reason that the kernel denies that procfs mount later: WARN[0000] The host root filesystem is mounted as "". Setting child propagation to "rslave" is not supported. WARN[0000] failed to mount sysfs, falling back to read-only mount: operation not permitted WARN[0000] failed to mount sysfs: operation not permitted _

It is not fully clear to me why 5.4.188 kernel under OpenWrt denies sysfs and procfs mounts Maybe the sysfs mount failure in rootlesskit causes procfs mount failure in containerd downstream

Per https://lists.linuxfoundation.org/pipermail/containers/2018-April/038840.html (not sure if this reflects the current kernel mount handling logic with user ns): _Since Linux v4.2 with commit 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace"), new mounts of proc or sysfs in non init userns are only allowed when there is at least one fully-visible proc or sysfs mount.nor why is denied

I'll try to set up rootlesskit as a non-privileged procd service, see if that makes a difference Not sure compiling rootkit as OpenWrt build process would make a difference, but ultimately it would have to be an OpenWrt package for sure

One key difference between Ubuntu (5.4.0-122) and OpenWrt (5.4.188) is that rootfs, sysfs and proc in Ubuntu are all mounted as shared
Ubuntu: 25 30 0:23 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw 26 30 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:14 - proc proc rw 30 1 8:7 / / rw,relatime shared:1 - ext4 /dev/sda7 rw,errors=remount-ro OpenWrt: 13 1 179:2 / / rw,noatime - ext4 /dev/root rw 14 13 0:5 / /proc rw,nosuid,nodev,noexec,noatime - proc proc rw 15 13 0:14 / /sys rw,nosuid,nodev,noexec,noatime - sysfs sysfs rw

Can this be done in OpenWrt and what are the possible implications?