projectatomic / oci-systemd-hook

OCI hook to enable running systemd in a container
GNU General Public License v3.0
64 stars 26 forks source link

Make the /run/oci-sytemd-hook-XXXXX directory MS_PRIVATE #98

Closed rhatdan closed 5 years ago

rhatdan commented 6 years ago

We are leaking mount points into the shared mount space, by mounting the directory private we are able to make changes and not have the mount points leak.

Signed-off-by: Daniel J Walsh dwalsh@redhat.com

rhatdan commented 6 years ago

@wking I think I fixed your issues.

wking commented 6 years ago

nit: PR subject has "ssystemd" when it should be "systemd".

rhatdan commented 6 years ago

I really need someone to test this to see if it fixes the issue people are seeing in podman.

aalba6675 commented 6 years ago

@rhatdan I hit the following error:

Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13804]: systemdhook <error>: 6cab0d8c9dc8: pid not found in state: Success
Jun 21 08:42:07 podman.localdomain conmon[13831]: conmon 6cab0d8c9dc817e69fd0 <ninfo>: about to waitpid: 13832
Jun 21 08:42:07 podman.localdomain kernel: SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <debug>: 6cab0d8c9dc8: rootfs=/var/lib/containers/storage/overlay/3518f024987889575e3e80e887ff2a86daf4c9f927f8>
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <debug>: 6cab0d8c9dc8: gidMappings not found in config
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <debug>: 6cab0d8c9dc8: GID: 0
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <debug>: 6cab0d8c9dc8: uidMappings not found in config
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <debug>: 6cab0d8c9dc8: UID: 0
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <error>: 6cab0d8c9dc8: Failed to remove /run/oci-systemd-hook.cGqlEG: Device or resource busy
Jun 21 08:42:07 podman.localdomain conmon[13831]: conmon 6cab0d8c9dc817e69fd0 <error>: Failed to create container: exit status 1
wking commented 6 years ago
Jun 21 08:42:07 podman.localdomain oci-systemd-hook[13860]: systemdhook <error>: 6cab0d8c9dc8: Failed to remove /run/oci-systemd-hook.cGqlEG: Device or resource busy

I don't know what's going on with that (maybe we need an umount to unwind our new mount of tmp_dir onto tmp_dir?), but while reading the code I turned up this additional question. Would switching to the simpler guard I propose there help with debugging this breakage?

rhatdan commented 6 years ago

@wking I reworked how you suggested. I am no longer creating the additional mount point. We only need to make private the tmpfs mounted on tmp_dir, I believe, which this patch does, and then only umount it when the code fails.

rhatdan commented 6 years ago

@aalba6675 Could you try the latest to see if it works any better?

aalba6675 commented 6 years ago

@rhatdan Got the same error as /tmp/ocitmp.XXXX as /run here (F28) has propagation shared:

oci-systemd-hook[9252]: systemdhook <error>: 6cab0d8c9dc8: Failed to move mount /run/oci-systemd-hook.9dlu3E to /var/lib/containers/storage/overlay/3518f024987889575e3e80e887ff2a86daf4c9f927f8d756e58d4b902457c37b/merged/run: Invalid argument

When I set /run to private it works.

aalba6675 commented 6 years ago

Separate enquiry; I see

Jun 24 20:26:47 podman.com oci-systemd-hook[9494]: systemdhook <debug>: 6cab0d8c9dc8: Found cgroup
Jun 24 20:26:47 podman.com oci-systemd-hook[9494]: systemdhook <debug>: 6cab0d8c9dc8: PATH: /libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr
Jun 24 20:26:47 podman.com oci-systemd-hook[9494]: systemdhook <debug>: 6cab0d8c9dc8: SUBSYSTEM_PATH: /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr

Is this why the host sees a doubled path; IOW, PATH is concatenated to the end of SUBSYSTEM_PATH ?

wking commented 6 years ago

Your example shows you making a dir, but not mounting on it.

Oops, right. But later on you mount tmp_dir onto mount_dir, so maybe that needs cleanup code?

We don't cleanup the mount_dir since it is on a tmpfs /run.

Ah, (eventual) tmpfs cleanup makes sense. But mount_dir is under rootfs. Must rootfs always be under /run?

rhatdan commented 6 years ago

With latest podman I am seeing no leaking. I still updated this package.

rhatdan commented 6 years ago

@wking @mrunalp @lsm5 PTAL People are still reporting issues, even though I was not seeing them.

rhatdan commented 6 years ago

@aalba6675 @thoraxe PTAL

aalba6675 commented 6 years ago
  1. Still cannot shift the mount:
    oci-systemd-hook[23858]: systemdhook <error>: 0cebf3cae8d7: Failed to move mount /tmp/oci-systemd-hook.rD2ygS to /var/lib/containers/storage/overlay/4e9473a4456abeba9bd112c8760a6bee48a0e83ab80be5fce188d263fec614d7/merged/run: Invalid argument
  2. Leakage of one cgroup namely /sys/fs/cgroup/systemd/libpod_parent/libpod-0cebf3cae8d7e554b2647893c5032354b2d843e5052ff7b4a29c28c82ed167c1 on the host. This can be umount'ed manually. The double-pathing libpod_parent/libpod-<container_uuid>/libpod_parent/libpod-<container_uuid> (containers with volume mounts) doesn't happen anymore!
rhatdan commented 6 years ago

@aalba6675 What is the exact Podman command you are seeing this with? And what is the Dockerfile you used to generate the image?

aalba6675 commented 6 years ago

@rhatdan - reproducer, oci-systemd-hook has #98 applied

# rpm -q oci-systemd-hook podman buildah
oci-systemd-hook-0.1.17-3.gitbd86a79.fc28.x86_64
podman-0.7.4-4.git80612fb.fc28.x86_64
buildah-1.3-1.git4888163.fc28.x86_64

Image:

CONT=$(buildah from centos:7)
buildah run $CONT yum -y install systemd openssh-server tmux rsync sudo
buildah run $CONT systemctl enable sshd
buildah run $CONT bash -c 'chpasswd <<< root:root_secure_password'
buildah commit $CONT c7test:1

Containers (with and w/o volume):

podman create --name test_ruby --entrypoint /sbin/init --stop-signal RTMIN+3 --network none c7test:1
podman create --name test_gold --entrypoint /sbin/init --stop-signal RTMIN+3 --network none -v /srv/docker/volumes/vagrant/home:/home:z c7test:1

Test 1: result: PASS

# force defaults for /tmp, container does not have host volumes
mount --make-shared /tmp
podman start test_ruby
# check if systemd is running
# podman exec test_ruby ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:45 ?        00:00:00 /sbin/init
root        19     1  0 01:45 ?        00:00:00 /usr/lib/systemd/systemd-journald
root        24     1  0 01:45 ?        00:00:00 /usr/sbin/sshd -D
root        26     1  0 01:45 ?        00:00:00 /usr/lib/systemd/systemd-logind
dbus        27     1  0 01:45 ?        00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root        30     0  0 01:45 ?        00:00:00 ps -ef

Test 2a: result: FAIL

mount --make-shared /tmp
podman start test_gold
systemdhook <error>: 06f83779973c: Failed to move mount /tmp/oci-systemd-hook.l2Cd83 to /var/lib/containers/storage/overlay/1f3182c51b31f2909fb8e369a0cc6ecff09150687984575c0907b43f9530d1c8/merged/run: Invalid argument

Test 2b: result: PASS??

mount --make-private /tmp
podman start test_gold
## container starts! but one cgroup still leaking...
## on host
# mount | grep libpod
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-06f83779973c0c88537c20bb28e9215998eaab7f146750539b0e05e612c3132d type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

## yay! systemd is running in test_gold with host volumes!
# podman exec test_gold ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:53 ?        00:00:00 /sbin/init
root        18     1  0 01:53 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus        24     1  0 01:53 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root        26     1  0 01:53 ?        00:00:00 /usr/lib/systemd/systemd-logind
root        27     1  0 01:53 ?        00:00:00 /usr/sbin/sshd -D
root        30     0  0 01:53 ?        00:00:00 ps -ef

## 
[root@podhost187 ~]# podman stop test_gold
06f83779973c0c88537c20bb28e9215998eaab7f146750539b0e05e612c3132d
[root@podhost187 ~]# mount | grep libpod
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-06f83779973c0c88537c20bb28e9215998eaab7f146750539b0e05e612c3132d type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

## this is the final cgroup leakage
aalba6675 commented 6 years ago

@rhatdan new reproducer with different paths with 0.8.2 (systemd as the default cgroup-manager):

# rpm -q podman
podman-0.8.2.1-1.gitf38eb4f.fc28.x86_64

podman run -t --name=systemd --env=container=podman --entrypoint=/sbin/init --stop-signal=RTMIN+3 -v /volumes/vagrant/home:/home:z fedora:28

mount | grep libpod
# mount | grep libpod
cgroup on /sys/fs/cgroup/systemd/system.slice/libpod-82265d6d94512df4a1cfd244c4cdccdaad16356f1332f9a2ed6a13c0aae1f3c9.scope type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

podman stop systemd
## the cgroup mount is leaked
# mount | grep libpod
cgroup on /sys/fs/cgroup/systemd/system.slice/libpod-82265d6d94512df4a1cfd244c4cdccdaad16356f1332f9a2ed6a13c0aae1f3c9.scope type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
rhatdan commented 5 years ago

Since we have directly integrated systemd support into podman, I am going to close this PR.