spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.88k stars 1.04k forks source link

Delay on Ubuntu shutdown #75

Closed jcdevzero closed 5 years ago

jcdevzero commented 5 years ago
Matrix-docker-ansible-deploy version: latest as of today
Docker version: 18.09.1
Ubuntu: 18.04

Install works great and have matrix/riot running fine. On system shutdown however, there's a 2-3minute delay on shutdown that was not there before the install (uninstalling via the instructions in the doc dir eliminates the issue).

Console shows the delay is during shutdown of the various matrix services. Systemd messages show pauses of up to 1 minute 30 seconds on each of the 6 containers. Message looks like this:

[ *** ] (1 of 6) A stop job is running for Matrix Coturn server (30s / 1:30)

This occurs when shutting down the system with /sbin/shutdown.

If I first stop the matrix services via 'sudo systemctl stop matrix*' then run '/sbin/shutdown -r 0' to restart the server, it goes down quickly. But if I don't stop the matrix services first, I see the delay during shutdown.

Note when stopping the services manually via 'sudo systemctl stop matrix*', the services go down quickly. Running a 'docker ps' confirms they all are down. Not sure why it's so slow during shutdown.

Spent some time tweaking settings in the matrix service files under /etc/systemd/system but nothing really helps outside of specifying a low TimeoutStopSec to reduce the delay but that's not ideal.

I do see a lot of these dockerd errors while these stop attempts are happening:

Jan 13 15:05:43 myserver dockerd[1889]: time="2019-01-13T15:05:43.593913457-08:00" level=error msg="failed to get event" error="rpc error: code
 = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
Jan 13 15:05:43 myserver dockerd[1889]: time="2019-01-13T15:05:43.593929085-08:00" level=error msg="failed to get event" error="rpc error: code
 = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
Jan 13 15:05:43 myserver systemd[1]: Stopping Matrix Riot web server...
Jan 13 15:05:43 myserver dockerd[1889]: time="2019-01-13T15:05:43.593945298-08:00" level=error msg="failed to get event" error="rpc error: code
 = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
Jan 13 15:05:43 myserver dockerd[1889]: time="2019-01-13T15:05:43.593961409-08:00" level=error msg="failed to get event" error="rpc error: code
 = Unavailable desc = transport is closing" module=libcontainerd namespace=moby

Any help or pointers would be appreciated.

spantaleev commented 5 years ago

I can reproduce it on Ubuntu 18.04 as well.

On a VM, I don't see such interactive messages ([ *** ] (1 of 6) A stop job is running for Matrix Coturn server (30s / 1:30)) as my console gets killed as soon as I do systemctl reboot.

To be able to get some logs, I had to:


Logs are like this for me:

Jan 14 10:02:36 matrix systemd-logind[984]: System is rebooting.
Jan 14 10:02:36 matrix systemd[1]: Stopped Unattended Upgrades Shutdown.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Cloud-init target.
Jan 14 10:02:36 matrix systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jan 14 10:02:36 matrix systemd[1]: Stopping Authorization Manager...
Jan 14 10:02:36 matrix sshd[1804]: Exiting on signal 15
Jan 14 10:02:36 matrix systemd[1]: Stopping Session 1 of user root.
Jan 14 10:02:36 matrix sshd[1804]: pam_unix(sshd:session): session closed for user root
Jan 14 10:02:36 matrix systemd[1]: Stopped target Timers.
Jan 14 10:02:36 matrix sshd[18409]: Exiting on signal 15
Jan 14 10:02:36 matrix systemd[1]: Stopped Message of the Day.
Jan 14 10:02:36 matrix sshd[18409]: pam_unix(sshd:session): session closed for user root
Jan 14 10:02:36 matrix systemd[1]: Stopping Session 5 of user root.
Jan 14 10:02:36 matrix systemd[1]: Stopped Discard unused blocks once a week.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Host and Network Name Lookups.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily apt upgrade and clean activities.
Jan 14 10:02:36 matrix systemd[1]: Stopped Daily apt download activities.
Jan 14 10:02:36 matrix systemd[1]: Stopped target System Time Synchronized.
Jan 14 10:02:36 matrix systemd[1]: Stopped Execute cloud user/final scripts.
Jan 14 10:02:36 matrix systemd[1]: Stopped Apply the settings specified in cloud-config.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Cloud-config availability.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Graphical Interface.
Jan 14 10:02:36 matrix systemd[1]: Stopped target Multi-User System.
Jan 14 10:02:36 matrix systemd[1]: Stopping LXD - container startup/shutdown...
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix mxisd identity server...
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix nginx proxy server...
Jan 14 10:02:36 matrix systemd[1]: Stopping Network Time Service...
Jan 14 10:02:36 matrix ntpd[18308]: ntpd exiting on signal 15 (Terminated)
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix mailer...
Jan 14 10:02:36 matrix ntpd[18308]: some irrelevant log entry here...
Jan 14 10:02:36 matrix sshd[1013]: Received signal 15; terminating.
Jan 14 10:02:36 matrix systemd[1]: Stopping OpenBSD Secure Shell server...
Jan 14 10:02:36 matrix systemd[1816]: pam_unix(systemd-user:session): session closed for user root
Jan 14 10:02:36 matrix snapd[1011]: 2019/01/14 10:02:36.596643 main.go:81: Exiting on terminated signal.
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix Coturn server...
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589810314+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589917199+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589950919+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.589983239+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590010174+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590035301+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590059273+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590082754+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590105528+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590130791+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix rsyslogd[988]:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="988" x-info="http://www.rsyslog.com"] exiting on signal 15.
Jan 14 10:02:36 matrix systemd[1]: Stopping D-Bus System Message Bus...
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590153027+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590176422+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590198491+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590223647+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590251090+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590274538+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590300159+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590327081+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590350539+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix dockerd[17922]: time="2019-01-14T10:02:36.590375116+01:00" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Jan 14 10:02:36 matrix systemd[1]: Stopping Matrix Synapse server...

... a lot more messages like this ...

I haven't really confirmed it, because there are so many log entries from my various reboots, that I'm getting lost.. But.. My guess is that it has something to do with containerd.service stopping before docker.service.. And then all Docker-related operations (which ultimately need to get forwarded to containerd) failing.

This can be due to one of these, or both at the same time:

The original docker.service supplied by Docker 18.09 CE is at: /lib/systemd/system/docker.service and contains something like this:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service
Wants=network-online.target
Requires=docker.socket

You can see that there's a BindsTo=containerd.service dependency, but no After=containerd.service definition.

An easy way to trigger the same shutdown trouble is to do something like this:

This also happens on CentOS, by the way. What I think doesn't happen on CentOS is the shutdown delay though. I haven't investigated why.. Perhaps systemd (an older v219) is following a different order there?


I've looked at the systemd documentation about Requires and BindsTo, but I'm not sure who exactly is to blame here..

My guess is that we need a After=containerd.service definition in docker.service to prevent such problems.

The easiest way to add it yourself is with a systemd drop-in. You can create a /etc/systemd/system/docker.service.d directory and a fix-containerd-relation.conf file within it with the following content:

[Unit]
After=containerd.service

After reloading systemd (systemctl daemon-reload) and restarting Docker (systemctl restart docker), performing a shutdown should not suffer from this problem.


It's probably best if you open this as an issue in: https://github.com/docker/for-linux/

jcdevzero commented 5 years ago

I should have tried with another container to isolate the issue to docker. Mistakenly ruled out docker after it went up/down correctly without the matrix containers.

I can confirm the after dependency resolves the issue. I'll file an issue with docker. Thanks!