oci-playground / freebsd-podman-testing

40 stars 5 forks source link

Failure: "operation not permitted" inside jail #18

Open LennertVA opened 1 day ago

LennertVA commented 1 day ago

Trying some basic podman things in a FreeBSD 14.1-RELEASE jail. Permissions have been given to the jail:

children.max = 20;
allow.mount;
allow.mount.devfs;
allow.mount.procfs;
allow.mount.linprocfs;
allow.mount.linsysfs;
allow.mount.zfs;
allow.mount.nullfs;
allow.mount.tmpfs;
allow.mount.fdescfs;
allow.raw_sockets;
allow.socket_af;
allow.sysvipc;
allow.chflags;
enforce_statfs=1;
devfs_ruleset=4;

exec.created+="zfs jail web-podman zjails/podman";
exec.release+="zfs unjail web-podman zjails/podman"; 

Trying a basic image:

root@web-podman:~ # podman --log-level trace run --rm --platform linux/x86_64 docker.io/debian:latest NAME=FreeBSD cat /etc/debian-version

This fails due to "something" being not permitted. Unfortunately, even "trace" log level doesn't say anything about what is failing or not permitted.

DEBU[0001] Downloading /v2/library/debian/blobs/sha256:7d98d813d54f6207a57721008a4081378343ad8f1b2db66c121406019171805b
DEBU[0001] GET [https://registry-1.docker.io/v2/lib...721008a4081378343ad8f1b2db66c121406019171805b](https://registry-1.docker.io/v2/library/debian/blobs/sha256:7d98d813d54f6207a57721008a4081378343ad8f1b2db66c121406019171805b)
Copying blob 7d98d813d54f [--------------------------------------] 0.0b / 47.3MiB (skipped: 0.0b = 0.00%)
Copying blob 7d98d813d54f [--------------------------------------] 0.0b / 47.3MiB | 0.0 b/s
Copying blob 7d98d813d54f done |
DEBU[0002] ID:0109e853-5493-4579-9e53-73918cb992e8 START /sbin/zfs create -o mountpoint=legacy zjails/podman/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1 storage-driver=Copying blob 7d98d813d54f done |
DEBU[0002] ID:0109e853-5493-4579-9e53-73918cb992e8 FINISH storage-driver=zfs
DEBU[0002] ID:be6f9630-ec7e-4583-966a-4b03aad23caf START /sbin/zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset zjails/podman/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1 storage-driver=zfs
DEBU[0002] ID:be6f9630-ec7e-4583-966a-4b03aad23caf FINISH storage-driver=zfs
DEBU[0002] mount("zjails/podman/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1", "/var/db/containers/storage/zfs/graph/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1", "") storage-driver=zfs
DEBU[0002] Start untar layer
ERRO[0002] While applying layer: ApplyLayer stdout: stderr: operation not permitted exit status 1
DEBU[0002] unmount("/var/db/containers/storage/zfs/graph/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1") storage-driver=zfs
DEBU[0002] ID:d1148784-cf32-424b-8f16-3c3b01d9d507 START /sbin/zfs destroy -r zjails/podman/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1 storage-driver=zfs
Copying blob 7d98d813d54f done |
DEBU[0002] Error pulling candidate docker.io/library/debian:latest: copying system image from manifest list: writing blob: adding layer with blob "sha256:7d98d813d54f6207a57721008a4081378343ad8f1b2db66c121406019171805b": ApplyLayer stdout: stderr: operation not permitted exit status 1
Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:7d98d813d54f6207a57721008a4081378343ad8f1b2db66c121406019171805b": ApplyLayer stdout: stderr: operation not permitted exit status 1
DEBU[0002] Shutting down engines

Reading/writing/creating things in the ZFS dataset appears to be working fine from within the jail commandline. Kind of stumped here, any pointers would be greatly appreciated.

LennertVA commented 1 day ago

The last "useful" log line "Start untar layer" seems to come from here:

https://github.com/containers/storage/blob/a42802ecedf6bab79d4c7f2fa7d74f95b1a8c91a/drivers/fsdiff.go#L191

Which would seem to call this function:

https://github.com/containers/storage/blob/a42802ecedf6bab79d4c7f2fa7d74f95b1a8c91a/pkg/archive/diff.go#L258

And this function does... an untar, basically, from what I can tell. There's no way to go drill deeper on what exactly there fails but there does not appear to be anything overly strange or difficult in there.

LennertVA commented 1 day ago

Okay, this turns out to be a red herring. The actual issue appears to be the line before, which for some moronic reason is not being logged as failed:

DEBU[0002] mount("zjails/podman/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1", "/var/db/containers/storage/zfs/graph/ef5f5ddeb0a6492f959cfdcfc6b0a3518e0a120db92e53ccb8225ee481e7a4a1", "") storage-driver=zfs

It is shown in the lines above:

/sbin/zfs create -o mountpoint=legacy

... that "legacy" mounts are being forced. And this fails:

root@web-podman:~ # zfs create zjails/podman/testtesttest
root@web-podman:~ # zfs mount zjails/podman/testtesttest
cannot mount 'zjails/podman/testtesttest': filesystem already mounted
root@web-podman:~ # zfs umount zjails/podman/testtesttest
root@web-podman:~ # mkdir /tmp/testtesttest
root@web-podman:~ # mount zjails/podman/testtesttest /tmp/testtesttest
mount: zjails/podman/testtesttest: Operation not permitted
root@web-podman:~ # zfs destroy zjails/podman/testtesttest
root@web-podman:~ #

So this is not a problem of podman per se - but it is a problem caused by the storage driver using legacy mount instead of the standard zfs mount.

This behaviour was introduced a long time ago (ping @Mic92):

https://github.com/containers/storage/commit/11e9167a6b45fdc134ee43e89abefd34a85cf624

So the questions become then:

  1. Can we force the podman storage ZFS driver to use "proper" mounts again instead of "legacy" (doubtful), or
  2. Can we configure the jail in a way to allow "legacy" mounts to be executed?
Mic92 commented 1 day ago

What might work is setting mountpoint=none and zfsutil as the mount option. At least on Linux this works, don't know much about freebsd.

LennertVA commented 1 day ago

What might work is setting mountpoint=none and zfsutil as the mount option. At least on Linux this works, don't know much about freebsd.

Aha - any hints on where that could be set (or would that require a rebuild)? Or do you mean using 'overlay' instead of 'zfs' and then specifying mountopt and mount_program?

Mic92 commented 1 day ago

zfs set mountpoint=none yourdataset, mount -t zfs -o zfsutil ...

LennertVA commented 8 hours ago

zfs set mountpoint=none yourdataset, mount -t zfs -o zfsutil ...

Can confirm that this indeed works. Now to figure out how to make this work with podman...

LennertVA commented 5 hours ago

Not even sure about the mounting anymore now. I ran podman in truss:

# truss podman --log-level trace run --rm --platform linux/x86_64 docker.io/debian:latest NAME=FreeBSD cat /etc/debian-version 2>&1 | tee podman-truss.log

...and it appears that there is data being written to mount point at least:

39174 write(16,"etc/apt/apt.conf.d/01autoremove"...,512) = 512 (0x200)

There's a good number of these before things go bad, so that would at least appear to have passed the point where a failing mount would cause trouble. Unless it's all buffered or god knows and the error only bubbles up later.

The first occurrence of "not permitted" is after a good few files and dirs appear to have been created:

39190 kevent(4,0x0,0,{ 19,EVFILT_READ,EV_CLEAR,0,0x17,0xded50ccccc80001 },64,{ 0.067696287 }) = 1 (0x1)
39191 read(19,"operation not permitted",512)           = 23 (0x17)
39192 read(19,0x874508017,1001)                        = 0 (0x0)
39193 _umtx_op(0x87407ed50,UMTX_OP_WAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
39194 _umtx_op(0x87407ed50,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x0) = 0 (0x0)
39195 close(19)                                        = 0 (0x0)

Not even sure how relevant it is. But right after:

39204 kevent(4,0x0,0,{ },64,{ 0.000000000 })           = 0 (0x0)
39205 SIGNAL 13 (SIGPIPE) code=SI_KERNEL
39206 kevent(4,0x0,0,{ },64,{ 0.000000000 })           = 0 (0x0)
39207 nanosleep({ 0.000003000 })                       = 0 (0x0)
39208 _umtx_op(0x874620550,UMTX_OP_WAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
39209 _umtx_op(0x874620550,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x0) = 0 (0x0)
39210 sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTI      N|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
39211 sigreturn(0x874893040)                           EJUSTRETURN
39212 close(16)                                        = 0 (0x0)
39213 read(12,"\M-i\M-K_\M^A\M^A\M-kv\M-NP\M-'p"...,32768) = 32768 (0x8000)
39214 read(12,"\M-{\M^\\\[\M^S\M-*\M-x\^O\M-Afs"...,32768) = 32768 (0x8000)
39215 nanosleep({ 0.010000000 })                       = 0 (0x0)
39216 thr_kill(137307,SIGURG)                          = 0 (0x0)
39217 SIGNAL 16 (SIGURG) code=SI_LWP pid=42476 uid=0
39218 _umtx_op(0x28df7f0,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x0) ERR#4 'Interrupted system call'
39219 sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTI      N|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
39220 SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=45102 uid=0 status=1
39221 nanosleep({ 0.000020000 })                       = 0 (0x0)
39222 wait6(P_PID,45102,{ EXITED,val=1 },WNOWAIT|WEXITED,0x0,0x0) = 45102 (0xb02e)
39223 sigreturn(0x8740a9040)                           EJUSTRETURN
39224 wait4(45102,{ EXITED,val=1 },0x0,{ u=0.000000,s=0.016000,in=0,out=0 }) = 45102 (0xb02e)
39225 nanosleep({ 0.000020000 })                       = 0 (0x0)
39226 sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTI      N|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
39227 thr_kill(137311,SIGURG)                          = 0 (0x0)
39228 SIGNAL 16 (SIGURG) code=SI_LWP pid=42476 uid=0
39229 sigreturn(0x87400f040)                           EJUSTRETURN
39230 nanosleep({ 0.000020000 })                       = 0 (0x0)
39231 sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTI      N|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
39232 sigreturn(0x87411f040)                           EJUSTRETURN
39233 nanosleep({ 0.000020000 })                       = 0 (0x0)
39234 nanosleep({ 0.000020000 })                       = 0 (0x0)
39235 time="2024-10-24T10:54:08+02:00" level=error msg="While applying layer: ApplyLayer stdout:  stderr: operation not permitted exit status 1"

So it would appear then that something barfed during extraction/copy of the archive? Added the full truss log here if anyone wants to take a look.

podman-truss.log.zip