opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.74k stars 2.09k forks source link

docker-runc hang on checking systemd.UseSystemd #1959

Open jzwlqx opened 5 years ago

jzwlqx commented 5 years ago

Yesterday one node of my kubernetes cluster became notready. ps -ef showed some docker-runc processes had been running many days

root     26579  1303  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true events --stats c29996ea9566f16616505e7118315635582714308564ba0d9a70f8fb8cf73f0a
root     27841  2913  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true kill --all 8561b78c9cb19c0d883e30eafc8ff41ddf3007043985271386ffdbafc24d4376 SIGKILL
root     28293  1303  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true delete 25660e4c1f66593ec33ae57823def641a4c4a9ae1a7c6840afd081961b66e66e

After some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.

In fact, any dbus method call send to org.freedesktop.systemd1 was not responsed, for example, the below command would wait forever:

dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect

Also there were many systemd errors in /var/log/messages: Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported

busctl tree reported Failed to introspect object / of service org.freedesktop.systemd1: Connection timed out

Resolved by restarting systemd: systemctl daemon-reexec

docker-runc stack:

PC=0x45a471 m=0 sigcode=128

goroutine 0 [idle]:
runtime.futex(0xb37950, 0x0, 0x0, 0x0, 0x0, 0xbdc4df90, 0x0, 0x0, 0x7ffdbdc4dfa0, 0x41178b, ...)
    /usr/local/go/src/runtime/sys_linux_amd64.s:425 +0x21
runtime.futexsleep(0xb37950, 0x0, 0xffffffffffffffff)
    /usr/local/go/src/runtime/os_linux.go:45 +0x62
runtime.notesleep(0xb37950)
    /usr/local/go/src/runtime/lock_futex.go:145 +0x6b
runtime.stopm()
    /usr/local/go/src/runtime/proc.go:1650 +0xad
runtime.findrunnable(0xc42001c000, 0x0)
    /usr/local/go/src/runtime/proc.go:2102 +0x2e4
runtime.schedule()
    /usr/local/go/src/runtime/proc.go:2222 +0x14c
runtime.park_m(0xc42006e340)
    /usr/local/go/src/runtime/proc.go:2285 +0xab
runtime.mcall(0x7ffdbdc4e130)
    /usr/local/go/src/runtime/asm_amd64.s:269 +0x5b

goroutine 1 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Object).Call(0xc42001bfb0, 0x7a026a, 0x33, 0xc4200fd000, 0xc420011340, 0x4, 0x4, 0xc4200fd0c0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/object.go:27 +0xc7
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).startJob(0xc420061570, 0x0, 0x7a026a, 0x33, 0xc420011340, 0x4, 0x4, 0x0, 0x0, 0x0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/methods.go:47 +0xdf
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).StartTransientUnit(0xc420061570, 0x7919cc, 0xa, 0x7904e9, 0x7, 0x0, 0x0, 0x0, 0x0, 0xc420062000, ...)
    /go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/methods.go:137 +0x24b
github.com/opencontainers/runc/libcontainer/cgroups/systemd.UseSystemd(0xc420012a00)
    /go/src/github.com/opencontainers/runc/libcontainer/cgroups/systemd/apply_systemd.go:106 +0x130
main.loadFactory(0xc420012a00, 0x0, 0x792d83, 0xe, 0x0)
    /go/src/github.com/opencontainers/runc/utils_linux.go:38 +0x1a0
main.getContainer(0xc420012a00, 0x1, 0x0, 0x0, 0x0)
    /go/src/github.com/opencontainers/runc/utils_linux.go:54 +0x61
main.glob..func3(0xc420012a00, 0x0, 0x0)
    /go/src/github.com/opencontainers/runc/delete.go:53 +0xa2
reflect.Value.call(0x724960, 0x7a5568, 0x13, 0x78f57e, 0x4, 0xc4200cb760, 0x1, 0x1, 0xc4200cb6f0, 0x784ca0, ...)
    /usr/local/go/src/reflect/value.go:434 +0x91f
reflect.Value.Call(0x724960, 0x7a5568, 0x13, 0xc4200cb760, 0x1, 0x1, 0x6a4bd6, 0x78f712, 0x4)
    /usr/local/go/src/reflect/value.go:302 +0xa4
github.com/opencontainers/runc/vendor/github.com/urfave/cli.HandleAction(0x724960, 0x7a5568, 0xc420012a00, 0x0, 0x0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:487 +0x18f
github.com/opencontainers/runc/vendor/github.com/urfave/cli.Command.Run(0x78feff, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7a2070, 0x4d, 0x0, ...)
    /go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/command.go:191 +0xac8
github.com/opencontainers/runc/vendor/github.com/urfave/cli.(*App).Run(0xc4200da000, 0xc420010200, 0x4, 0x4, 0x0, 0x0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:240 +0x5d6
main.main()
    /go/src/github.com/opencontainers/runc/main.go:137 +0xbd2

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 6 [syscall]:
os/signal.signal_recv(0x621f1b)
    /usr/local/go/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 7 [IO wait]:
net.runtime_pollWait(0x7f211b3d0000, 0x72, 0xc4200c7778)
    /usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420060d18, 0x72, 0x1000, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420060d18, 0xc4200f2a60, 0x10)
    /usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).readMsg(0xc420060cb0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xffffffffffffffff, 0x0, 0x0, ...)
    /usr/local/go/src/net/fd_unix.go:304 +0x21a
net.(*UnixConn).readMsg(0xc42000e0a0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xc4200f2754, 0xc4200fc8a0, 0xc4200f271b, ...)
    /usr/local/go/src/net/unixsock_posix.go:115 +0x91
net.(*UnixConn).ReadMsgUnix(0xc42000e0a0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xc4200613b0, 0x7, 0xc4200f2954, ...)
    /usr/local/go/src/net/unixsock.go:131 +0xaa
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*oobReader).Read(0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x1020, 0xc4200f7300, 0x0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:21 +0x8f
io.ReadAtLeast(0xb20f00, 0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x10, 0x7563a0, 0xc420062801, 0xc4200f7300)
    /usr/local/go/src/io/io.go:307 +0xa9
io.ReadFull(0xb20f00, 0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x21, 0xc42001bf80, 0x0)
    /usr/local/go/src/io/io.go:325 +0x58
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*unixTransport).ReadMessage(0xc420015e40, 0xc4200fc420, 0x14, 0x78f13a)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:85 +0x138
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).inWorker(0xc4200a6b40)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:248 +0x4b
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:118 +0x749

goroutine 8 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).outWorker(0xc4200a6b40)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:385 +0x5a
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:119 +0x76e

goroutine 18 [IO wait]:
net.runtime_pollWait(0x7f211b3cff40, 0x72, 0xc420123778)
    /usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420100068, 0x72, 0x1000, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420100068, 0xc4200f2fc0, 0x10)
    /usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).readMsg(0xc420100000, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xffffffffffffffff, 0x0, 0x0, ...)
    /usr/local/go/src/net/fd_unix.go:304 +0x21a
net.(*UnixConn).readMsg(0xc4200be030, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xc4200f2e5c, 0xb522e0, 0xc4200f2e23, ...)
    /usr/local/go/src/net/unixsock_posix.go:115 +0x91
net.(*UnixConn).ReadMsgUnix(0xc4200be030, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xc420061880, 0x7, 0xc4200f2f32, ...)
    /usr/local/go/src/net/unixsock.go:131 +0xaa
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*oobReader).Read(0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x1020, 0xc4200f9900, 0x0)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:21 +0x8f
io.ReadAtLeast(0xb20f00, 0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x10, 0x7563a0, 0x70ea01, 0xc4200f9900)
    /usr/local/go/src/io/io.go:307 +0xa9
io.ReadFull(0xb20f00, 0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x0, 0x0, 0x0)
    /usr/local/go/src/io/io.go:325 +0x58
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*unixTransport).ReadMessage(0xc4200bc160, 0xc4200ba420, 0xc420123e00, 0xc420100098)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:85 +0x138
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).inWorker(0xc420102000)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:248 +0x4b
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:118 +0x749

goroutine 19 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).outWorker(0xc420102000)
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:385 +0x5a
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
    /go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:119 +0x76e

goroutine 9 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).dispatch.func1(0xc4200708a0, 0xc420061570)
    /go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/subscription.go:64 +0x5f
created by github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).dispatch
    /go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/subscription.go:96 +0x83

rax    0xca
rbx    0x0
rcx    0xffffffffffffffff
rdx    0x0
rdi    0xb37950
rsi    0x0
rbp    0x7ffdbdc4df70
rsp    0x7ffdbdc4df28
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x0
r13    0x260
r14    0x0
r15    0xd3
rip    0x45a471
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

Bellows are more details

OS Linux host-k8s-node001.ymt.io 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

DBUS Daemon: 1.10.24

Systemd:

systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Kubelet:

Kubernetes v1.11.2

Docker Info:


 Running: 35
 Paused: 0
 Stopped: 12
Images: 17
Server Version: 17.06.2-ce-3
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 462c82662200a17ee39e74692f536067a3576a50
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 23.39GiB
Name: host-k8s-node001.ymt.io
ID: V2XS:XAL5:3WSG:KYTJ:QAT7:45I7:GMO5:HZAV:UNYU:UDPQ:EE4A:CSQU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 harbor.ymt.io
 127.0.0.0/8
Registry Mirrors:
 https://pqbap4ya.mirror.aliyuncs.com/
Live Restore Enabled: false```
BSWANG commented 5 years ago

Ping @hqhq @mrunalp

hqhq commented 5 years ago

I don't see any useful information on runc side, @mrunalp have you seen this kind of issue on RHEL?

sixijun commented 5 years ago

while large number of units are created/removed for a period of time like 5 months, sd_bus->cookie will be overflowed and dbus org.freedesktop.systemd1 will be no response at all because systemd cannot seal dbus1 type of messages.

this issue impact Kubernetes cluster nodes for a long while and it's not easy to reproduce. when issue occurs, the node cannot create new container. we can only reboot the system, or reexec systemd to solve the issue.

we should figure out some way to re-code the UseSystemd function or it's callers, so far the systemd will get like hundreds of message from runc even if we run a docker exec command. the test will overflow sd_bus->cookie sooner or later.

int bus_message_seal(sd_bus_message m, uint64_t cookie, usec_t timeout) { struct bus_body_part part; ... if (cookie > 0xffffffffULL && !BUS_MESSAGE_IS_GVARIANT(m)) return -ENOTSUP;

bus_message_seal (m=0x55cbc5a75790, cookie=4294967731, timeout=25000000) at src/libsystemd/sd-bus/bus-message.c:2924 2924 int bus_message_seal(sd_bus_message *m, uint64_t cookie, usec_t timeout) {

(gdb) info registers rax 0x55cbc5a75790 94333682800528 rbx 0x55cbc5b5eba0 94333683755936 rcx 0x1000001b2 4294967730 rdx 0x17d7840 25000000 _rsi 0x1000001b3 4294967731 cookie rdi 0x55cbc5a75790 94333682800528 sdbus rbp 0x0 0x0 rsp 0x7ffd80ae0268 0x7ffd80ae0268 r8 0x55cbc5a6ef50 94333682773840 r9 0x55cbc5abed00 94333683100928 r10 0x4 4 r11 0x55cbc5a75938 94333682800952 r12 0x0 0 r13 0x7ffd80ae0320 140726762341152 r14 0x7ffd80ae0300 140726762341120 r15 0x55cbc3ffcfc8 94333655044040 rip 0x55cbc3f54d40 0x55cbc3f54d40 eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0

sixijun commented 5 years ago

So far when I create or stop a test container in docker environemnt like busybox, there will be about 80 test unit creation/removal messages that are send to bus org.freedesktop.systemd1 in function systemd.UseSystemd, and the bus org.freedesktop.systemd1 is absolutely under unit new/remove message storm.

sixijun commented 5 years ago

https://github.com/systemd/systemd/pull/11818/files

BSWANG commented 5 years ago

Thanks @sixijun. @mrunalp @hqhq Does the runc should cache the UseSystemd result into /runc/ directory to avoid frequently systemd dbus message?

cuericlee commented 5 years ago

@mrunalp @hqhq any comments to cache UseSystemd result into /runc/ to reduce the number of systemd dbus message?