systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.34k stars 3.81k forks source link

cgroup limit setting errors can be swallowed #22211

Open bobrik opened 2 years ago

bobrik commented 2 years ago

systemd version the issue has been seen with

systemd 249 (249.4-1-cloudflare-2021.9.0)
+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS -OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

Used distribution

Debian Buster

Linux kernel version used (uname -a)

5.10.75, but applies to all of them.

Expected behaviour you didn't see

If I set MemoryMax=15G, I expect both kernel and systemd agree on the current value.

Unexpected behaviour you saw

In reality I see:

     Memory: 21.8G (max: 15.0G available: 0B)
$ cat /sys/fs/cgroup/memory/blah.slice/memory.limit_in_bytes
32436649984
Jan 21 23:02:52 foo systemd[1]: blah.slice: Failed to set 'memory.limit_in_bytes' attribute on '/blah.slice' to '16106127360': Device or resource busy

Steps to reproduce the problem

  1. Make a service use X amount of memory.
  2. Ask systemd to set memory limit below X.

The kernel does not allow to set the limit below current usage :

I think systemd shouldn't show 15G as the limit when it's not been set (in my example). As a consequence, it might be good to try to enforce the limit on every reload until it's successful. Not sure if there should be any sort of warning in systemctl output.

Werkov commented 2 years ago

Thanks for reporting.

(Actually, the kernel allows shrinking memory (the page_counter link is not complete truth), in your case the reclaim was too hard or straight impossible, so the kernel gave up with EBUSY.) I can see how systemd dynamic property update could be improved in this regard (best to simplest):

FTR, systemctl daemon-reload already re-applies (once) the cgroup settings. OTOH, it's not a good idea to base the solution on this, basically turning PID 1 into reclaim worker on behalf of the shrunk unit.