Open jzwlqx opened 5 years ago
Ping @hqhq @mrunalp
I don't see any useful information on runc side, @mrunalp have you seen this kind of issue on RHEL?
while large number of units are created/removed for a period of time like 5 months, sd_bus->cookie will be overflowed and dbus org.freedesktop.systemd1 will be no response at all because systemd cannot seal dbus1 type of messages.
this issue impact Kubernetes cluster nodes for a long while and it's not easy to reproduce. when issue occurs, the node cannot create new container. we can only reboot the system, or reexec systemd to solve the issue.
we should figure out some way to re-code the UseSystemd function or it's callers, so far the systemd will get like hundreds of message from runc even if we run a docker exec command. the test will overflow sd_bus->cookie sooner or later.
int bus_message_seal(sd_bus_message m, uint64_t cookie, usec_t timeout) { struct bus_body_part part; ... if (cookie > 0xffffffffULL && !BUS_MESSAGE_IS_GVARIANT(m)) return -ENOTSUP;
bus_message_seal (m=0x55cbc5a75790, cookie=4294967731, timeout=25000000) at src/libsystemd/sd-bus/bus-message.c:2924 2924 int bus_message_seal(sd_bus_message *m, uint64_t cookie, usec_t timeout) {
(gdb) info registers
rax 0x55cbc5a75790 94333682800528
rbx 0x55cbc5b5eba0 94333683755936
rcx 0x1000001b2 4294967730
rdx 0x17d7840 25000000
_rsi 0x1000001b3 4294967731 cookie rdi 0x55cbc5a75790 94333682800528 sdbus
rbp 0x0 0x0
rsp 0x7ffd80ae0268 0x7ffd80ae0268
r8 0x55cbc5a6ef50 94333682773840
r9 0x55cbc5abed00 94333683100928
r10 0x4 4
r11 0x55cbc5a75938 94333682800952
r12 0x0 0
r13 0x7ffd80ae0320 140726762341152
r14 0x7ffd80ae0300 140726762341120
r15 0x55cbc3ffcfc8 94333655044040
rip 0x55cbc3f54d40 0x55cbc3f54d40
So far when I create or stop a test container in docker environemnt like busybox, there will be about 80 test unit creation/removal messages that are send to bus org.freedesktop.systemd1 in function systemd.UseSystemd, and the bus org.freedesktop.systemd1 is absolutely under unit new/remove message storm.
Thanks @sixijun.
@mrunalp @hqhq Does the runc should cache the UseSystemd
result into /runc/
directory to avoid frequently systemd dbus message?
@mrunalp @hqhq any comments to cache UseSystemd result into /runc/ to reduce the number of systemd dbus message?
Yesterday one node of my kubernetes cluster became notready.
ps -ef
showed some docker-runc processes had been running many daysAfter some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.
In fact, any dbus method call send to
org.freedesktop.systemd1
was not responsed, for example, the below command would wait forever:dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect
Also there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported
busctl tree
reportedFailed to introspect object / of service org.freedesktop.systemd1: Connection timed out
Resolved by restarting systemd:
systemctl daemon-reexec
docker-runc stack:
Bellows are more details
OS
Linux host-k8s-node001.ymt.io 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
DBUS Daemon:
1.10.24
Systemd:
Kubelet:
Kubernetes v1.11.2
Docker Info: