nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

sysbox-runc initialization issues when running over kernels lacking swap-mem-limit feature #234

Closed rodnymolina closed 3 years ago

rodnymolina commented 3 years ago

Sysbox-runc is consistently getting stuck while running tests/cgroup/cgroup.bats:test_cgroup_memory() testcase. Problem is reproduced in this setup:

$ uname -r
5.4.0-1029-gcp

$ lsb_release -d
Description:    Ubuntu 18.04.5 LTS
$

Sys container is able to register with sysbox-fs, so problems start at a very late stage in the initialization cycle. I made use of the debugger to iterate through sysbox-runc's initialization logic for both the parent and its children processes and didn't notice anything abnormal. Actually, the 'hang' is not observed till sysbox-runc's parent process is almost done with the container initialization, and by then, its grand-child process has already exec()ed to complete its initialization.

(dlv) frame 10
> syscall.Syscall() /usr/local/go/src/syscall/asm_linux_amd64.s:27 (PC: 0x4eaf70)
Frame 10: ./sysbox-runc/libcontainer/process_linux.go:161 (PC: d705ae)
   156:     }
   157:     if err := utils.WriteJSON(p.messageSockPair.parent, p.config); err != nil {
   158:         return newSystemErrorWithCause(err, "writing config to pipe")
   159:     }
   160:
=> 161:     ierr := parseSync(p.messageSockPair.parent, func(sync *syncT) error {         <<<===== stuck here!!!!
   162:         switch sync.Type {
   163:         case procReady:
   164:             // This shouldn't happen.
   165:             panic("unexpected procReady in setns")
   166:
(dlv)
> github.com/opencontainers/runc/libcontainer.(*linuxStandardInit).Init() ./sysbox-runc/libcontainer/standard_init_linux.go:314 (PC: 0xd8bfe0)
   309:     s.Status = specs.StateCreated
   310:     if err := l.config.Config.Hooks[configs.StartContainer].RunHooks(s); err != nil {
   311:         return err
   312:     }
   313:
=> 314:     if err := unix.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
   315:         return newSystemErrorWithCause(err, "exec user process")
   316:     }
   317:     return nil
   318: }
(dlv) p l.config.Args[0:]
[]string len: 3, cap: 3, [
    "tail",
    "-f",
    "/dev/null",
]
(dlv) n                                                <<<<=== stuck here!!!!

Looks like problem is somewhat related to the lack of swap-memory-limitation feature by this kernel (see generated log further below):

$ ls -lrt /sys/fs/cgroup/memory/memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Mar  5 22:38 /sys/fs/cgroup/memory/memory.limit_in_bytes
$

$ sudo ls -lrt /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes
ls: cannot access '/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes': No such file or directory
$

Problem can be easily reproduced by spawning a sys container with this instruction:

# docker run --rm --memory="16M" --oom-kill-disable ghcr.io/nestybox/alpine-docker-dbg:latest sh
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
#

If problem ends up being related to the absence of this kernel feature, sysbox-runc should identify this scenario and return a friendly message to the user. In either case, we should always return a prompt back to the user and avoid getting stuck.

ctalledo commented 3 years ago

After investigating further, we noticed the problem was caused by a test error. The cgroup memory limit test causes the container to exceed its allocated cgroup limit, and in some machines (e.g., GCP ubuntu) this causes any further "docker exec" into the container to hang (because the docker exec causes the exec process to be entered into the cgroup, and since the cgroup's mem limit has been exceeded, the kernel pauses the process). The container is not killed since the test purposefully launched the container with the OOM killer disabled.

The fix is to modify the test such that after the container exceeds its mem limit, the test no longer docker execs into it (rather it uses nsenter to get whatever data it was going to get from inside the container).

Fix is here: https://github.com/nestybox/sysbox/pull/235