nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.75k stars 150 forks source link

--net <any network> causes "open /etc/resolv.conf: permission denied: unknown" #696

Open SkyperTHC opened 1 year ago

SkyperTHC commented 1 year ago

works fine without --net. Should work with --net as well.

reproduce:

# docker run --rm --runtime=sysbox-runc --net testnet alpine
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:424: starting container process caused: process_linux.go:607: container init caused: switching Docker DNS: rootfs_linux.go:1409: writing /etc/resolv.conf caused: open /etc/resolv.conf: permission denied: unknown.

Docker version 23.0.5, build bc4487a

sysbox-runc edition: Community Edition (CE) version: 0.6.1 commit: 278997aab055ad6eec9e48a555b90eef877596b7 built at: Sat Apr 8 06:08:15 UTC 2023 built by: Rodny Molina oci-specs: 1.0.2-dev

# docker run --rm --runtime=sysbox-runc  alpine  ls -al /etc/resolv.conf
-rw-r--r--    1 nobody   nobody         607 May  9 19:23 /etc/resolv.conf
rodnymolina commented 1 year ago

@SkyperTHC, as you can see, the EPERM is observed while attempting to write() into /etc/resolv.conf, which fails due to the unexpected UID/GID settings (nobody:nobody) for this resource.

Which kernel version and distro-release are you running? You shouldn't see this if you either have shiftfs kernel module installed (in cases where kernel is < 5.12), or if id-mapping is available (kernel >= 5.12). See more details here.

SkyperTHC commented 1 year ago

@rodnymolina

shiftfs is loaded. No luck.

# lsmod | grep shiftfs
shiftfs                28672  0
# uname -r
5.4.0-146-generic
# docker run --rm --runtime=sysbox-runc --net sf-guest alpine
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:424: starting container process caused: process_linux.go:607: container init caused: switching Docker DNS: rootfs_linux.go:1409: writing /etc/resolv.conf caused: open /etc/resolv.conf: permission denied: unknown.

I think the bug is related how sysbox handles overlays. For example mounting /etc/resolv.conf:ro takes you on another ride where /etc/resolv.conf is overwritten (which should not be possible when :ro is used).

# echo "nameserver 127.0.0.11" >/tmp/r.conf
# chmod 777 /tmp/r.conf
# cat /tmp/r.conf
nameserver 127.0.0.11
# docker run --rm --runtime=sysbox-runc --net sf-guest -v /tmp/r.conf:/etc/resolv.conf:ro alpine
# cat /tmp/r.conf
nameserver 10.11.0.1

=> Oops, who changed 127.0.0.11 to 10.11.0.1 when it is supposed to be read-only mounted from host?

Docker treats resolv.conf and 3 other files as special files (that's understood; any root-fs restriction is bypassed) and thus we need to mount them as read-only so that no attacker within a container can pop the docker-daemon by just flooding gigabytes to /etc/resolv.conf.

ctalledo commented 1 year ago

Hi @SkyperTHC, thanks for reporting the issues. Let's take one at a time.

First, for the permissions on /etc/resolv.conf, can you please show the output of mount | grep resolv.conf from inside the container?

On my Ubuntu Kinetic host (kernel 5.19), here's what I see:

$ docker run --runtime=sysbox-runc -it --rm --net=some-network alpine
/ # ls -l /etc/resolv.conf 
-rw-r--r--    1 root     root            80 May 10 17:24 /etc/resolv.conf
/ # mount | grep resolv.conf
/dev/nvme0n1p5 on /etc/resolv.conf type ext4 (rw,relatime,idmapped,errors=remount-ro)

Can you also indicate your kernel version and the underlying filesystem where /var/lib/docker lives (ext4, xfs, etc.)?

For the read-only mount of /etc/resolv.conf, the mount is still read-only; you can verify this by trying to write to /etc/resolv.conf from within the container (it will fail with "read-only filesystem"):

$ echo "nameserver 127.0.0.11" >/tmp/r.conf

$ docker run --runtime=sysbox-runc -it --rm --net some-network -v /tmp/r.conf:/etc/resolv.conf:ro alpine
/ # echo data > /etc/resolv.conf 
/bin/sh: can't create /etc/resolv.conf: Read-only file system

However, you are correct in noticing that Sysbox changes the /etc/resolv.conf from what was mounted:

/ # cat /etc/resolv.conf 
nameserver 172.25.0.1

Sysbox does this by default in order to avoid a DNS resolution problem that occurs when running Docker-in-Docker (i.e., running Docker inside a Sysbox container) on user-defined networks. We call it "dns-aliasing" and it's enabled by default in Sysbox as otherwise running Docker inside Sysbox containers (very common) will fail with DNS resolution issues.

You can turn it off via the "sysbox-mgr" systemd service (/lib/systemd/system/sysbox-mgr.service) by passing the --alias-dns=false flag and restarting sysbox

$ cat sysbox-mgr.service
...
ExecStart=/usr/bin/sysbox-mgr --alias-dns=false
...

$ sudo systemctl restart sysbox

Hope that helps!

SkyperTHC commented 1 year ago

Hi,

versions and mount points:

# grep VERSION= /etc/os-release
VERSION="20.04.4 LTS (Focal Fossa)"
# uname -r
5.4.0-146-generic
# lsmod | grep shiftfs
shiftfs                28672  0
# ls -ald /var/lib/docker
lrwxrwxrwx 1 root root 10 Mar 21 15:01 /var/lib/docker -> /sf/docker
# mount | grep -F /sf/docker
/dev/loop7 on /sf/docker type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
# docker --version
Docker version 23.0.5, build bc4487a
# docker network create testnet

I can not start with --net=testnet (my initial problem) and thus can not complete the task you are asking me to do:

# docker run --rm --runtime=sysbox-runc --net testnet alpine ash -c 'ls -al /etc/resolv.conf; mount | grep -F resolv.conf'
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:424: starting container process caused: process_linux.go:607: container init caused: switching Docker DNS: rootfs_linux.go:1409: writing /etc/resolv.conf caused: open /etc/resolv.conf: permission denied: unknown.

Instead I can do:

# docker run --rm --runtime=sysbox-runc  alpine ash -c 'ls -al /etc/resolv.conf; mount | grep -F resolv.conf'
-rw-r--r--    1 nobody   nobody         607 May 11 08:00 /etc/resolv.conf
/dev/loop7 on /etc/resolv.conf type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)

And I can do:

# echo "nameserver 127.0.0.11" >/tmp/r.conf
# chmod 777 /tmp/r.conf
# docker run --rm --runtime=sysbox-runc --net testnet -v /tmp/r.conf:/etc/resolv.conf:ro alpine ash -c 'ls -al /etc/resolv.conf; mount | grep -F resolv.conf'
-rwxrwxrwx    1 nobody   nobody          22 May 11 08:01 /etc/resolv.conf
/dev/sda2 on /etc/resolv.conf type ext4 (ro,relatime)

Appreciate your answer regarding the 2nd problem: /resolv.conf is overwritten by sysbox (even if mounted read-only) but thereafter is read-only from within the container. Counter-intuitive but I see why you do it like that and why it works.

UPDATE: It works fine on a t2.small instance with ubuntu 22.04.2 LTS and 5.150-1031-aws.

ctalledo commented 1 year ago

Hi @SkyperTHC,

I see what's going on: shiftfs does not work on top of XFS, so Sysbox was not able to mount shiftfs on the container's /etc/resolv.conf. This causes that file to show up with nobody:nobody permissions inside the container, which in turn prevents Sysbox from modifying the file.

There are a few solutions / work-arounds:

1) Try a machine with a newer Ubuntu distro (e.g. Ubuntu Jammy or Kinetic). These come with Linux kernel 5.12+, which brings in a feature called "ID-mapped mounts" that replaces shiftfs.

or

2) Upgrade your Ubuntu Focal kernel from 5.4 -> 5.12+ or above. This way you get ID-mapped mounts in your Ubuntu Focal host.

or

3) Place /var/lib/docker (and /var/lib/sysbox) on an EXT4 partition.

I think we could also make an improvement in Sysbox so that even if /etc/resolv.conf has the nobody:nobody ownership inside the container, Sysbox can reconfigure it. That would make things work in your case too.

UPDATE: It works fine on a t2.small instance with ubuntu 22.04.2 LTS and 5.150-1031-aws.

Makes sense, 5.15 brings in ID-mapped mounts as described above :)

Hope this helps!