Closed jcdevzero closed 5 years ago
I can reproduce it on Ubuntu 18.04 as well.
On a VM, I don't see such interactive messages ([ *** ] (1 of 6) A stop job is running for Matrix Coturn server (30s / 1:30)
) as my console gets killed as soon as I do systemctl reboot
.
To be able to get some logs, I had to:
add systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on enforcing=0
to the kernel line, as seen here: https://freedesktop.org/wiki/Software/systemd/Debugging/#shutdowncompleteseventually
make sure journald persistence is enabled (Storage=auto
in /etc/systemd/journald.conf
)
Logs are like this for me:
Jan 14 10:02:36 matrix systemd-logind[984]: System is rebooting.
Jan 14 10:02:36 matrix systemd[1]: Stopped Unattended Upgrades Shutdown.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Cloud-init target.
Jan 14 10:02:36 matrix systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jan 14 10:02:36 matrix systemd[1]: Stopping Authorization Manager...
Jan 14 10:02:36 matrix sshd[1804]: Exiting on signal 15
Jan 14 10:02:36 matrix systemd[1]: Stopping Session 1 of user root.
Jan 14 10:02:36 matrix sshd[1804]: pam_unix(sshd:session): session closed for user root
Jan 14 10:02:36 matrix systemd[1]: Stopped target Timers.
Jan 14 10:02:36 matrix sshd[18409]: Exiting on signal 15
Jan 14 10:02:36 matrix systemd[1]: Stopped Message of the Day.
Jan 14 10:02:36 matrix sshd[18409]: pam_unix(sshd:session): session closed for user root
Jan 14 10:02:36 matrix systemd[1]: Stopping Session 5 of user root.
Jan 14 10:02:36 matrix systemd[1]: Stopped Discard unused blocks once a week.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Host and Network Name Lookups.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily apt upgrade and clean activities.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily apt download activities.
Jan 14 10:02:36 matrix systemd[1]: Stopped target System Time Synchronized.
Jan 14 10:02:36 matrix systemd[1]: Stopped Execute cloud user/final scripts.
Jan 14 10:02:36 matrix systemd[1]: Stopped Apply the settings specified in cloud-config.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Cloud-config availability.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Graphical Interface.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Multi-User System.
Jan 14 10:02:36 matrix systemd[1]: Stopping LXD - container startup/shutdown...
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix mxisd identity server...
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix nginx proxy server...
Jan 14 10:02:36 matrix systemd[1]: Stopping Network Time Service...
Jan 14 10:02:36 matrix ntpd[18308]: ntpd exiting on signal 15 (Terminated)
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix mailer...
Jan 14 10:02:36 matrix ntpd[18308]: some irrelevant log entry here...
Jan 14 10:02:36 matrix sshd[1013]: Received signal 15; terminating.
Jan 14 10:02:36 matrix systemd[1]: Stopping OpenBSD Secure Shell server...
Jan 14 10:02:36 matrix systemd[1816]: pam_unix(systemd-user:session): session closed for user root
Jan 14 10:02:36 matrix snapd[1011]: 2019/01/14 10:02:36.596643 main.go:81: Exiting on terminated signal.
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix Coturn server...
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589810314+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589917199+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589950919+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589983239+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590010174+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590035301+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590059273+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590082754+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590105528+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590130791+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix rsyslogd[988]: [origin software="rsyslogd" swVersion="8.32.0" x-pid="988" x-info="http://www.rsyslog.com"] exiting on signal 15.
Jan 14 10:02:36 matrix systemd[1]: Stopping D-Bus System Message Bus...
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590153027+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590176422+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590198491+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590223647+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590251090+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590274538+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590300159+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590327081+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590350539+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590375116+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix Synapse server...
... a lot more messages like this ...
I haven't really confirmed it, because there are so many log entries from my various reboots, that I'm getting lost.. But.. My guess is that it has something to do with containerd.service
stopping before docker.service
.. And then all Docker-related operations (which ultimately need to get forwarded to containerd) failing.
This can be due to one of these, or both at the same time:
Docker packaging problems (docker.service
not being properly set as depending on containerd.service
)
systemd (at least 237-3ubuntu10.11
, which is the latest on Ubuntu 18.04 now) not understanding docker.service
's dependency on containerd.service
(BindsTo=containerd.service
definition).
The original docker.service
supplied by Docker 18.09 CE is at: /lib/systemd/system/docker.service
and contains something like this:
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service
Wants=network-online.target
Requires=docker.socket
You can see that there's a BindsTo=containerd.service
dependency, but no After=containerd.service
definition.
An easy way to trigger the same shutdown trouble is to do something like this:
--tags=start
)systemctl stop containerd
systemctl status docker
still says the service is running. Well, it may be running, but it won't be able to do muchsystemctl stop matrix-synapse
or docker kill matrix-synapse
) - it will be slow and generate some errorsThis also happens on CentOS, by the way. What I think doesn't happen on CentOS is the shutdown delay though. I haven't investigated why.. Perhaps systemd (an older v219) is following a different order there?
I've looked at the systemd documentation about Requires and BindsTo, but I'm not sure who exactly is to blame here..
My guess is that we need a After=containerd.service
definition in docker.service
to prevent such problems.
The easiest way to add it yourself is with a systemd drop-in. You can create a /etc/systemd/system/docker.service.d
directory and a fix-containerd-relation.conf
file within it with the following content:
[Unit]
After=containerd.service
After reloading systemd (systemctl daemon-reload
) and restarting Docker (systemctl restart docker
), performing a shutdown should not suffer from this problem.
It's probably best if you open this as an issue in: https://github.com/docker/for-linux/
I should have tried with another container to isolate the issue to docker. Mistakenly ruled out docker after it went up/down correctly without the matrix containers.
I can confirm the after dependency resolves the issue. I'll file an issue with docker. Thanks!
Install works great and have matrix/riot running fine. On system shutdown however, there's a 2-3minute delay on shutdown that was not there before the install (uninstalling via the instructions in the doc dir eliminates the issue).
Console shows the delay is during shutdown of the various matrix services. Systemd messages show pauses of up to 1 minute 30 seconds on each of the 6 containers. Message looks like this:
This occurs when shutting down the system with /sbin/shutdown.
If I first stop the matrix services via 'sudo systemctl stop matrix*' then run '/sbin/shutdown -r 0' to restart the server, it goes down quickly. But if I don't stop the matrix services first, I see the delay during shutdown.
Note when stopping the services manually via 'sudo systemctl stop matrix*', the services go down quickly. Running a 'docker ps' confirms they all are down. Not sure why it's so slow during shutdown.
Spent some time tweaking settings in the matrix service files under /etc/systemd/system but nothing really helps outside of specifying a low TimeoutStopSec to reduce the delay but that's not ideal.
I do see a lot of these dockerd errors while these stop attempts are happening:
Any help or pointers would be appreciated.