projectatomic / oci-systemd-hook

OCI hook to enable running systemd in a container
GNU General Public License v3.0
64 stars 26 forks source link

Insane cgroup mounts in /sys/fs/cgroup/systemd/libpod_parent with podman+bind mounts #94

Open aalba6675 opened 6 years ago

aalba6675 commented 6 years ago
  1. Firstly, this issue is only triggered when using bind mounts with podman.
  2. When a bind mount is used, the cgroups mounted in /sys/fs/cgroup/systemd/libpod_parent has an infinite recursion after a few stop/start cycles. In fact there are 86(!) mounts as below, but only four unique mount paths. These directories don't exist and the cgroup cannot be unmounted. Furthermore, after the third time the container cannot be started. No other container can be started (even those without bind mounts).
  3. @mheon in https://github.com/projectatomic/libpod/pull/507 suggests that the bug is here as these crazy mount paths are not created by podman.
  4. Crazy mounts: https://github.com/projectatomic/libpod/files/1991848/cgroup.zip
  5. Any pointers as to who/what is recursively adding to the cgroup mount path?

Reproducer:

# this kludge is necessary otherwise we cannot MS_MOVE the mount
mount --make-private /tmp
podman run --name bobby_silver -v /volumes/podman/home:/home:z --entrypoint /sbin/init fedora:28
podman stop bobby_silver
podman start bobby_silver
podman stop boddy_silver
podman start bobby_silver # third time fails

single: cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr

doubled:

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

tripled: cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

quadrupled: cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

mheon commented 6 years ago

The first potential problem here is that we're mounting libpod's cgroupfs CGroups into systemd's CGroup hierarchy. That strikes me as wrong (and potentially dangerous). Systemd CGroup support in libpod is still very much in progress, but once it is complete we might want to consider making it mandatory for oci-systemd-hook (it'll be moving to default once it's ready to match CRI-O's current default)

aalba6675 commented 6 years ago

Unrelated to cleanup: on start up we also seem to have a bug: the host sees 3 mounts, two on the single path and one on the double path (specific to containers created with -v).

cgroup on /sys/fs/cgroup/systemd/
libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr 
type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

cgroup on /sys/fs/cgroup/systemd/
libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr
type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

cgroup on /sys/fs/cgroup/systemd/
libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr/
libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr
type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

After the container is stopped, if these paths are manually unmounted, then the container can be restarted indefinitely, i.e, it doesn't trap itself in tripled/quad paths etc.

This may explain the triple/quad path - when the container is restarted without proper cleanup it just keeps adding to the cgroup path ad inifinitum...

aalba6675 commented 6 years ago

The weird thing is the containers that do not use -v see the single/single/double mount and the host does not see the mount. When the container exits, these mounts are gracefully deleted.

This container does not use -v; it sees

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

The host sees nothing.

mheon commented 6 years ago

@rhatdan This definitely sounds like a separate bug than Podman not firing the cleanup hooks (unless we want a duplicated CGroup path for some reason)?

I'm betting that mount propogation is the reason for only seeing this with volume mounts - we probably need to change our mount propogation to mount volumes into the container.

aalba6675 commented 6 years ago

Normally a non--v container also sees one single-path mount, the single/single/double appears when a -v container is started after it.

Non--v container is running:

# this is a non-`-v` container
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9075fe81752a0a9383e587ba9af6de76d546cfec3f3d23683d1de165c69ed96f/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

Start a -v container; then check again. Three mounts of the -v container are propagated into the non--v container

## three of these mounts are from another container
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9075fe81752a0a9383e587ba9af6de76d546cfec3f3d23683d1de165c69ed96f/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr/libpod_parent/libpod-9ffaff1cdcab235dc1dabdb25d6d1e209f044957b02b533874e0aaf17c0200db/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
aalba6675 commented 6 years ago

Filed as https://github.com/projectatomic/oci-systemd-hook/issues/95