Closed chivalryq closed 3 months ago
Hi; I can't seem to reproduce this, at least on GKE.
gVisor doesn't by itself do memory limiting; instead, it relies on the host Linux kernel to do this. It is set up here as part of container startup which eventually ends up here to control memory. This way, it limits both the total memory usage of the sum of the gVisor kernel and the processes within it with a single limit. If that goes over the limit, this should be killed by the Linux OOM killer, and this should be visible in dmesg
on the machine.
The enforcement mechanism depends on many moving parts, so I suggest checking all of them.
/sys/fs/cgroup
).runsc
's --ignore-cgroups
flag is not specified.runsc
's --systemd-cgroup
, make sure you have systemd >= v244.Linux.CgroupsPath
may need to be set properly in the OCI spec. It is probably incorrect (but need debug logs to check)dev.gvisor.spec.cgroup-parent
annotation to set the cgroups path as well (this would show up in debug logs).If all of this is in place, please provide runsc debug logs, details on how you installed gVisor within the Kubernetes cluster (runsc
flags etc.), systemd version (systemd --version
), cgroup version (output of cat /proc/mounts
), which cgroup controllers are enabled (cat /sys/fs/cgroup/cgroup.controllers
).
Also please check #10371 which was filed recently after this issue and looks quite similar.
@EtiennePerot Thanks for replying! We have found the problem thanks to @charlie0129.
It ends up with that we didn't configure gvisor to use systemd-cgroup which is our cgroup manager in the cluster. After add systemd-cgroup
and upgrade the gvisor to the latest version, the OOM pod is properly killed by Linux. If I understand it correctly the default option is to use cgroupfs which is not the mainstream. Would it be better to move to systemd-cgroup as a defualt?
But I don't seem to find any related document/FAQs about cgroup manager. Forgive me if I miss it. And if there is truly not any of them. It would be kind to mention it somewhere in document.
Would it be better to move to systemd-cgroup as a default?
See discussion on https://github.com/google/gvisor/issues/10371 on this. Apparently runc
's default behavior is also systemd-group=false
, and runsc
needs to match runc
behavior in order to remain a drop-in replacement for it. But +1 on the need for documentation.
Description
I'm building a sandbox service with gVisor. But the python seems to be able to apply unlimited memory while a bash script trying to apply unlimited memory are marked Error in Pod status.
Steps to reproduce
I got the result. The memory is ~62GiB because in my pod because I'm trying to investigating why it makes our machine to be OOM. So, my pod apply ~100GiB memory.
runsc version
docker version (if using docker)
No response
uname
Linux 3090-k8s-node029 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)