Killing docker-containerd breaks interaction with containers

thaJeztah commented 6 years ago

When killing docker-containerd, interacting with containers (docker exec, docker stop, docker kill) fails:

docker kill testing
Error response from daemon: Cannot kill container: testing: Cannot kill container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

docker rm -f lucid_yalow
Error response from daemon: Could not kill running container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359, cannot remove - Cannot kill container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

But killing dockerd (either by killall -9 dockerd or a SIGHUP; killall -HUP dockerd) restores functionality.

This problem could explain some reports about "unkillable" containers, where everything appears to be running, but interaction is not possible (possibly after containerd was OOM killed, but could have different causes).

Steps to reproduce / information

Have docker running, start a container, and check output of ps auxf: docker-containerd and docker-containerd-shim are child-processes of dockerd:

root     11468  1.1  3.4 468232 71036 ?        Ssl  11:56   0:01 /usr/bin/dockerd -H fd://
root     11473  0.4  1.3 236512 27856 ?        Ssl  11:56   0:00  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
root     11918  0.0  0.1   7516  3788 ?        Sl   11:57   0:00      \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
root     11933  0.1  0.0   1236     4 pts/0    Ss+  11:57   0:00          \_ sh

Now, kill docker-containerd (killall -9 docker-containerd).

docker-containerd is restarted (by dockerd); observe that docker-containerd-shim and the container process(es) are reparented (I haven't checked what the new parent process is, and if this is relevant). The docker-containerd-shim processes are no longer child-process of docker-containerd;

root     11468  160  3.6 470984 74664 ?        Ssl  11:56  19:55 /usr/bin/dockerd -H fd://
root     11979  0.1  1.2 300992 25980 ?        Ssl  11:58   0:01  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
root     11918  0.0  0.2   7516  4688 ?        Sl   11:57   0:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
root     11933  0.0  0.0   1236     4 pts/0    Ss+  11:57   0:00  \_ sh

At this point, interacting with containers is now broken..

Containers still show up as running:

docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
9bfdba3fc8ee        busybox             "sh"                About a minute ago   Up About a minute                       testing

Inspecting the container still works, and shows the pid of the container;

docker inspect --format '{{json .State}}' testing | jq .

{
  "Status": "running",
  "Running": true,
  "Paused": false,
  "Restarting": false,
  "OOMKilled": false,
  "Dead": false,
  "Pid": 11933,
  "ExitCode": 0,
  "Error": "",
  "StartedAt": "2018-01-12T11:57:47.687627373Z",
  "FinishedAt": "0001-01-01T00:00:00Z"
}

But any interaction with the containers is broken;

docker kill testing
Error response from daemon: Cannot kill container: testing: Cannot kill container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

docker rm -f lucid_yalow
Error response from daemon: Could not kill running container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359, cannot remove - Cannot kill container 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

When directly connecting to containerd, containers still show:

docker-containerd-ctr --namespace=moby --address /var/run/docker/containerd/docker-containerd.sock containers ls

CONTAINER                                                           IMAGE    RUNTIME                           
9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359    -        io.containerd.runtime.v1.linux

And can be inspected;

docker-containerd-ctr --namespace=moby --address /var/run/docker/containerd/docker-containerd.sock containers info 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359

......

Shims are still up:

netstat -x | grep shim
unix  2      [ ]         STREAM     CONNECTED     64641    @/containerd-shim/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359/shim.sock
unix  3      [ ]         STREAM     CONNECTED     64019    @/containerd-shim/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359/shim.sock

docker-runc --root /var/run/docker/runtime-runc/moby/ state 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359
{
  "ociVersion": "1.0.0",
  "id": "9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359",
  "pid": 11933,
  "status": "running",
  "bundle": "/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359",
  "rootfs": "/var/lib/docker/overlay2/9c0e355304db9fb85f7c1281b11008eea23bd4dbb142f11f551066c9fdb2e70e/merged",
  "created": "2018-01-12T11:57:47.631870877Z",
  "owner": ""
}

And the container is still functional, when using docker-runc;

docker-runc --root /var/run/docker/runtime-runc/moby/ exec 9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359 ls -la

total 44
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 .
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 ..
-rwxr-xr-x    1 root     root             0 Jan 12 11:57 .dockerenv
drwxr-xr-x    2 root     root         12288 Jan  8 21:14 bin
drwxr-xr-x    5 root     root           360 Jan 12 11:57 dev
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 etc
drwxr-xr-x    2 nobody   nogroup       4096 Jan  8 21:14 home
dr-xr-xr-x  125 root     root             0 Jan 12 11:57 proc
drwxr-xr-x    2 root     root          4096 Jan  8 21:14 root
dr-xr-xr-x   13 root     root             0 Jan 12 11:57 sys
drwxrwxrwt    2 root     root          4096 Jan  8 21:14 tmp
drwxr-xr-x    3 root     root          4096 Jan  8 21:14 usr
drwxr-xr-x    4 root     root          4096 Jan  8 21:14 var

restore functionality

Kill dockerd (killall -9 dockerd) or SIGHUP (killall -HUP dockerd).

Observe that shims are not re-parented (which is probably expected);

root     11918  0.0  0.2   7516  4688 ?        Sl   11:57   0:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9bfdba3fc8eee79d6ca5773f7caff5dc5a8379037e98b6ded5c8b68df5750359 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-contai
root     11933  0.0  0.0   1236     4 pts/0    Ss+  11:57   0:00  \_ sh
root     12287  1.1  2.8 446232 57824 ?        Ssl  12:55   0:00 /usr/bin/dockerd -H fd://
root     12293  0.7  1.1 300928 22616 ?        Ssl  12:55   0:00  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml

But now it's possible again to interact with them:

docker exec testing ls -la

total 44
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 .
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 ..
-rwxr-xr-x    1 root     root             0 Jan 12 11:57 .dockerenv
drwxr-xr-x    2 root     root         12288 Jan  8 21:14 bin
drwxr-xr-x    5 root     root           360 Jan 12 11:57 dev
drwxr-xr-x    1 root     root          4096 Jan 12 11:57 etc
drwxr-xr-x    2 nobody   nogroup       4096 Jan  8 21:14 home
dr-xr-xr-x  126 root     root             0 Jan 12 11:57 proc
drwxr-xr-x    1 root     root          4096 Jan 12 12:58 root
dr-xr-xr-x   13 root     root             0 Jan 12 11:57 sys
drwxrwxrwt    2 root     root          4096 Jan  8 21:14 tmp
drwxr-xr-x    3 root     root          4096 Jan  8 21:14 usr
drwxr-xr-x    4 root     root          4096 Jan  8 21:14 var

Version of docker and containerd

Tested on Ubuntu 16.04 on DigitalOcean;

docker-containerd --version
containerd github.com/containerd/containerd v1.0.0 89623f28b87a6004d4b785663257362d1658a729

Client:
 Version:   18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f5
 Built: Wed Jan 10 20:11:05 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:  18.01.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   03596f5
  Built:    Wed Jan 10 20:09:37 2018
  OS/Arch:  linux/amd64
  Experimental: false

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 18.01.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-108-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.953GiB
Name: ubuntu-2gb-ams3-01
ID: KIY5:X5P2:5FI5:GEPC:Q2OO:XF4P:KFB2:S22T:A76T:DVFV:UIFB:ZATY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

thaJeztah commented 6 years ago

Briefly spoke with @crosbymichael on Slack, and he suspects that its probably something in the dockerd code that is not restoring things correctly, and the sighup is fixing that

/cc @stevvooe @mlaventure

zmlpjuran commented 6 years ago

We are facing the similar issue, the difference is in reproduce steps. Wen we run out of memory on builders the containerd is killed and restarted by oom-killer. The result is the same.

thaJeztah commented 6 years ago

@zmlpjuran thanks for adding that; yes I anticipated that if containerd was OOM-killed, the same would happen (see my top description)

CpuID commented 6 years ago

I think @caomania and I may have experienced this in Docker for Mac today (17.12 mac49). Plausible this would be existing in a hyperkit/linuxkit based VM?

thaJeztah commented 6 years ago

I did some more testing, and it looks like it's not always possible to recover by sendig a SIGHUP to dockerd.

Steps to reproduce;

docker run -it --rm --privileged -v /var/lib/docker docker:18.01 dockerd --debug --iptables=false

Then, opening an docker exec in the container, and kill docker-containerd;

docker exec -it $(docker ps -q -n1) sh

/ # killall -9 docker-containerd
/ # docker run --rm hello-world
docker: Error response from daemon: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused".
ERRO[0002] error waiting for container: context canceled

ERRO[2018-01-30T00:30:53.660577429Z] a5b5dade85229266867c72d6411f3b3222b74715abb62822e6b39462b95cc7c2 cleanup: failed to delete container from containerd: no such container 
ERRO[2018-01-30T00:30:53.680504829Z] Handler for POST /v1.35/containers/a5b5dade85229266867c72d6411f3b3222b74715abb62822e6b39462b95cc7c2/start returned error: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused"

But even after doing a SIGHUP of dockerd, connection with containerd is lost (and something is consuming resources);

/ # killall -HUP dockerd
/ # docker run --rm hello-world
docker: Error response from daemon: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused".
ERRO[0000] error waiting for container: context canceled

INFO[2018-01-27T05:33:45.611135136Z] Got signal to reload configuration, reloading from: /etc/docker/daemon.json 
DEBU[2018-01-27T05:33:45.611409299Z] Reset Max Concurrent Downloads: 3            
DEBU[2018-01-27T05:33:45.611551128Z] Reset Max Concurrent Uploads: 5              
WARN[2018-01-27T05:33:45.642215587Z] failed to retrieve containerd version: rpc error: code = Internal desc = connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused"

Stack dump (containerd);

``` DEBU[0141] received signal module=containerd signal=user defined signal 1 INFO[0141] === BEGIN goroutine stack dump === goroutine 18 [running]: main.dumpStacks() /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main_unix.go:69 +0x8c main.handleSignals.func1(0xc420062480, 0xc420058d80, 0x147d120, 0xc420194270, 0xc4200624e0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main_unix.go:44 +0x2cb created by main.handleSignals /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main_unix.go:30 +0x8b goroutine 1 [chan receive]: main.main.func1(0xc42009d080, 0xc42009d080, 0xc420049b4f) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:134 +0x878 github.com/containerd/containerd/vendor/github.com/urfave/cli.HandleAction(0xfac1c0, 0x10c1c68, 0xc42009d080, 0xc420058d20, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:502 +0xd4 github.com/containerd/containerd/vendor/github.com/urfave/cli.(*App).Run(0xc42016ca80, 0xc420010090, 0x3, 0x3, 0x0, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:268 +0x655 main.main() /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:137 +0x53d goroutine 16 [syscall]: os/signal.signal_recv(0x1477120) /usr/local/go/src/runtime/sigqueue.go:131 +0xa8 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:22 +0x24 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:28 +0x43 goroutine 19 [select, locked to thread]: runtime.gopark(0x10c24b0, 0x0, 0xbaf7f6, 0x6, 0x18, 0x1) /usr/local/go/src/runtime/proc.go:287 +0x132 runtime.selectgo(0xc420033f50, 0xc4200625a0) /usr/local/go/src/runtime/select.go:395 +0x114f runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:511 +0x226 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 goroutine 20 [select]: github.com/containerd/containerd/vendor/github.com/docker/go-events.(*Broadcaster).run(0xc42006b9f0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:117 +0x414 created by github.com/containerd/containerd/vendor/github.com/docker/go-events.NewBroadcaster /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:39 +0x1b1 goroutine 21 [select]: github.com/containerd/containerd/gc/scheduler.(*gcScheduler).run(0xc4200591a0, 0x147d120, 0xc4201ce510) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:243 +0x21d created by github.com/containerd/containerd/gc/scheduler.init.0.func1 /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:107 +0x4bf goroutine 22 [syscall]: syscall.Syscall6(0xe8, 0x5, 0xc4200359b8, 0x80, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5 github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait(0x5, 0xc4200359b8, 0x80, 0x80, 0xffffffffffffffff, 0x0, 0x0, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:1518 +0x79 github.com/containerd/containerd/metrics/cgroups.(*oomCollector).start(0xc4201978e0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/metrics/cgroups/oom.go:98 +0x7d created by github.com/containerd/containerd/metrics/cgroups.newOOMCollector /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/metrics/cgroups/oom.go:34 +0x125 goroutine 24 [IO wait]: internal/poll.runtime_pollWait(0x7fc4967e1f70, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:173 +0x59 internal/poll.(*pollDesc).wait(0xc4200a4818, 0x72, 0xc4202ceb00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0 internal/poll.(*pollDesc).waitRead(0xc4200a4818, 0xffffffffffffff00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f internal/poll.(*FD).Accept(0xc4200a4800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4 net.(*netFD).accept(0xc4200a4800, 0xc400000020, 0xc4202ced70, 0x411f9a) /usr/local/go/src/net/fd_unix.go:238 +0x44 net.(*UnixListener).accept(0xc4201fb230, 0x5261cc, 0x1060d00, 0xc4202c00f0) /usr/local/go/src/net/unixsock_posix.go:162 +0x34 net.(*UnixListener).Accept(0xc4201fb230, 0xc420016050, 0xfc9280, 0x14580d0, 0x10abc00) /usr/local/go/src/net/unixsock.go:241 +0x4b net/http.(*Server).Serve(0xc4202c8000, 0x147c060, 0xc4201fb230, 0x0, 0x0) /usr/local/go/src/net/http/server.go:2695 +0x1b4 net/http.Serve(0x147c060, 0xc4201fb230, 0x14713a0, 0xc4202c0000, 0x10c2048, 0xc420026f20) /usr/local/go/src/net/http/server.go:2323 +0x75 github.com/containerd/containerd/server.(*Server).ServeDebug(0xc4201912c0, 0x147c060, 0xc4201fb230, 0xc420026f38, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/server/server.go:159 +0x1c8 github.com/containerd/containerd/server.(*Server).ServeDebug-fm(0x147c060, 0xc4201fb230, 0xc4201fb230, 0xc420062480) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:117 +0x40 main.serve.func1(0x147c060, 0xc4201fb230, 0xc42020e630, 0x147d120, 0xc4201fb2f0, 0xc420095d80, 0x37) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:148 +0x77 created by main.serve /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:146 +0x1c8 goroutine 25 [IO wait]: internal/poll.runtime_pollWait(0x7fc4967e1eb0, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:173 +0x59 internal/poll.(*pollDesc).wait(0xc4200a4998, 0x72, 0xc420038b00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0 internal/poll.(*pollDesc).waitRead(0xc4200a4998, 0xffffffffffffff00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f internal/poll.(*FD).Accept(0xc4200a4980, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4 net.(*netFD).accept(0xc4200a4980, 0xc42000e220, 0x0, 0x0) /usr/local/go/src/net/fd_unix.go:238 +0x44 net.(*UnixListener).accept(0xc4201fb380, 0x89742b, 0x45a970, 0xc420038da0) /usr/local/go/src/net/unixsock_posix.go:162 +0x34 net.(*UnixListener).Accept(0xc4201fb380, 0x10c18f0, 0xc420083e00, 0x1480740, 0xc42000e220) /usr/local/go/src/net/unixsock.go:241 +0x4b github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).Serve(0xc420083e00, 0x147c060, 0xc4201fb380, 0x0, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:463 +0x198 github.com/containerd/containerd/server.(*Server).ServeGRPC(0xc4201912c0, 0x147c060, 0xc4201fb380, 0xc420021738, 0x1461e60) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/server/server.go:138 +0x55 github.com/containerd/containerd/server.(*Server).ServeGRPC-fm(0x147c060, 0xc4201fb380, 0xc4201fb380, 0xc420192800) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:131 +0x40 main.serve.func1(0x147c060, 0xc4201fb380, 0xc42020e720, 0x147d120, 0xc4201fb440, 0xc420095e80, 0x31) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:148 +0x77 created by main.serve /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/cmd/containerd/main.go:146 +0x1c8 goroutine 26 [IO wait]: internal/poll.runtime_pollWait(0x7fc4967e1df0, 0x72, 0x0) /usr/local/go/src/runtime/netpoll.go:173 +0x59 internal/poll.(*pollDesc).wait(0xc4200a4d18, 0x72, 0xffffffffffffff00, 0x14739e0, 0x146de18) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0 internal/poll.(*pollDesc).waitRead(0xc4200a4d18, 0xc420314000, 0x8000, 0x8000) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f internal/poll.(*FD).Read(0xc4200a4d00, 0xc420314000, 0x8000, 0x8000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c net.(*netFD).Read(0xc4200a4d00, 0xc420314000, 0x8000, 0x8000, 0x11, 0x0, 0x0) /usr/local/go/src/net/fd_unix.go:202 +0x54 net.(*conn).Read(0xc42000e220, 0xc420314000, 0x8000, 0x8000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:176 +0x6f bufio.(*Reader).Read(0xc4200599e0, 0xc42009a738, 0x9, 0x9, 0x9, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x30d io.ReadAtLeast(0x146f7a0, 0xc4200599e0, 0xc42009a738, 0x9, 0x9, 0x9, 0x74ae78c20039bb8, 0x5a6c0f2e, 0xc420039bc0) /usr/local/go/src/io/io.go:309 +0x88 io.ReadFull(0x146f7a0, 0xc4200599e0, 0xc42009a738, 0x9, 0x9, 0x20d517f12c, 0x14baac0, 0xbe9321ab86aa2bc6) /usr/local/go/src/io/io.go:327 +0x5a github.com/containerd/containerd/vendor/golang.org/x/net/http2.readFrameHeader(0xc42009a738, 0x9, 0x9, 0x146f7a0, 0xc4200599e0, 0x0, 0x7070e0900000000, 0xc4201cb418, 0xc420039ce8) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/golang.org/x/net/http2/frame.go:237 +0x7d github.com/containerd/containerd/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc42009a700, 0xc420015e40, 0xc420015e40, 0x0, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/golang.org/x/net/http2/frame.go:492 +0xa6 github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams(0xc42009d4a0, 0xc42030b2f0, 0x10c1928) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/http2_server.go:393 +0x317 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams(0xc420083e00, 0x1480260, 0xc42009d4a0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:568 +0x142 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveHTTP2Transport(0xc420083e00, 0x1480740, 0xc42000e220, 0x0, 0x0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:561 +0x473 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleRawConn(0xc420083e00, 0x1480740, 0xc42000e220) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:526 +0x499 created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).Serve /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:495 +0x5bb goroutine 27 [select]: github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.loopyWriter(0x7fc4967e63a0, 0xc4203074c0, 0xc42030b290, 0xc4202cbfb8) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/transport.go:750 +0x2e6 github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.newHTTP2Server.func1(0xc42009d4a0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/http2_server.go:227 +0x60 created by github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.newHTTP2Server /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/http2_server.go:226 +0x8fb goroutine 28 [select]: github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.(*http2Server).keepalive(0xc42009d4a0) /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/http2_server.go:935 +0x266 created by github.com/containerd/containerd/vendor/google.golang.org/grpc/transport.newHTTP2Server /tmp/tmp.X5mtKvYY4Q/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/transport/http2_server.go:230 +0x920 === END goroutine stack dump === ```

Stack dump (dockerd);

``` goroutine 52 [running]: github.com/docker/docker/pkg/signal.DumpStacks(0x1d57c1c, 0xf, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/docker/docker/pkg/signal/trap.go:83 +0xc5 github.com/docker/docker/daemon.(*Daemon).setupDumpStackTrap.func1(0xc42026e2a0, 0x1d57c1c, 0xf) /go/src/github.com/docker/docker/daemon/debugtrap_unix.go:19 +0x8b created by github.com/docker/docker/daemon.(*Daemon).setupDumpStackTrap /go/src/github.com/docker/docker/daemon/debugtrap_unix.go:17 +0xc9 goroutine 1 [chan receive, 2 minutes]: main.(*DaemonCli).start(0xc420115e00, 0xc42015c690, 0x0, 0x0) /go/src/github.com/docker/docker/cmd/dockerd/daemon.go:314 +0x1b0a main.runDaemon(0xc42015c690, 0xc4203347e0, 0x0) /go/src/github.com/docker/docker/cmd/dockerd/docker.go:78 +0x76 main.newDaemonCommand.func1(0xc420084480, 0xc420116260, 0x0, 0x2, 0x0, 0x0) /go/src/github.com/docker/docker/cmd/dockerd/docker.go:29 +0x5b github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0xc420084480, 0xc420010190, 0x2, 0x2, 0xc420084480, 0xc420010190) /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:646 +0x44d github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420084480, 0x1ace0c0, 0x1de0701, 0xc420118840) /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:742 +0x30e github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420084480, 0xc420118840, 0x1d4ac00) /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:695 +0x2b main.main() /go/src/github.com/docker/docker/cmd/dockerd/docker.go:105 +0xe1 goroutine 18 [syscall]: os/signal.signal_recv(0x2bad720) /usr/local/go/src/runtime/sigqueue.go:131 +0xa6 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:22 +0x22 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:28 +0x41 goroutine 14846 [select]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*http2Client).controller(0xc42033dc80) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:1130 +0x142 created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:280 +0xf4c goroutine 51 [chan receive, 1 minutes]: github.com/docker/docker/pkg/signal.Trap.func1(0xc42010f320, 0x2b9dfe0, 0xc420074190, 0xc420116920) /go/src/github.com/docker/docker/pkg/signal/trap.go:38 +0x65 created by github.com/docker/docker/pkg/signal.Trap /go/src/github.com/docker/docker/pkg/signal/trap.go:36 +0x122 goroutine 14847 [select, 2 minutes]: github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).transportMonitor(0xc42006eea0) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:891 +0x1de created by github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).resetAddrConn /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:608 +0x6ef goroutine 25 [chan receive]: github.com/docker/docker/libcontainerd.(*remote).monitorConnection(0xc42006f860, 0xc420abbbc0) /go/src/github.com/docker/docker/libcontainerd/remote_daemon.go:270 +0x9c created by github.com/docker/docker/libcontainerd.New /go/src/github.com/docker/docker/libcontainerd/remote_daemon.go:116 +0x601 goroutine 26 [select, 2 minutes, locked to thread]: runtime.gopark(0x1de2bd0, 0x0, 0x1d4a6be, 0x6, 0x18, 0x1) /usr/local/go/src/runtime/proc.go:287 +0x12c runtime.selectgo(0xc420185f50, 0xc420062b40) /usr/local/go/src/runtime/select.go:395 +0x1149 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:511 +0x220 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 goroutine 57 [IO wait, 2 minutes]: internal/poll.runtime_pollWait(0x7f46b9c6cdf0, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:173 +0x57 internal/poll.(*pollDesc).wait(0xc42033a398, 0x72, 0xc420032c00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae internal/poll.(*pollDesc).waitRead(0xc42033a398, 0xffffffffffffff00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(*FD).Accept(0xc42033a380, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e2 net.(*netFD).accept(0xc42033a380, 0xc420032e28, 0xc420032e48, 0x411ec8) /usr/local/go/src/net/fd_unix.go:238 +0x42 net.(*UnixListener).accept(0xc4203d8ae0, 0x51550a, 0x1c0d4c0, 0xc420212b70) /usr/local/go/src/net/unixsock_posix.go:162 +0x32 net.(*UnixListener).Accept(0xc4203d8ae0, 0xc420016038, 0x1a659a0, 0x2b73db0, 0x1d14380) /usr/local/go/src/net/unixsock.go:241 +0x49 net/http.(*Server).Serve(0xc4203dadd0, 0x2bba420, 0xc4203d8ae0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:2695 +0x1b2 net/http.Serve(0x2bba420, 0xc4203d8ae0, 0x2b9f3a0, 0xc4203d8db0, 0x434768, 0x1de2a50) /usr/local/go/src/net/http/server.go:2323 +0x73 github.com/docker/docker/daemon.(*Daemon).listenMetricsSock.func1(0x2bba420, 0xc4203d8ae0, 0xc4203d8db0) /go/src/github.com/docker/docker/daemon/metrics_unix.go:31 +0x4b created by github.com/docker/docker/daemon.(*Daemon).listenMetricsSock /go/src/github.com/docker/docker/daemon/metrics_unix.go:30 +0x193 goroutine 14585 [syscall, 2 minutes]: syscall.Syscall6(0xf7, 0x1, 0x3a, 0xc42020b5a0, 0x1000004, 0x0, 0x0, 0x1, 0x1, 0x448087) /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5 os.(*Process).blockUntilWaitable(0xc4204d47b0, 0x100000000, 0x10, 0x0) /usr/local/go/src/os/wait_waitid.go:31 +0xa5 os.(*Process).wait(0xc4204d47b0, 0xc420abe9e0, 0xc420c4e7c0, 0x0) /usr/local/go/src/os/exec_unix.go:22 +0x42 os.(*Process).Wait(0xc4204d47b0, 0x30003, 0xc42020b7be, 0xc42020b7b8) /usr/local/go/src/os/exec.go:115 +0x2b os/exec.(*Cmd).Wait(0xc4208f0160, 0x0, 0x0) /usr/local/go/src/os/exec/exec.go:446 +0x62 github.com/docker/docker/libcontainerd.(*remote).startContainerd.func1(0xc4208f0160, 0xc42006f860) /go/src/github.com/docker/docker/libcontainerd/remote_daemon.go:243 +0x2f created by github.com/docker/docker/libcontainerd.(*remote).startContainerd /go/src/github.com/docker/docker/libcontainerd/remote_daemon.go:241 +0x432 goroutine 14845 [IO wait]: internal/poll.runtime_pollWait(0x7f46b9c6cc70, 0x72, 0x0) /usr/local/go/src/runtime/netpoll.go:173 +0x57 internal/poll.(*pollDesc).wait(0xc42012a518, 0x72, 0xffffffffffffff00, 0x2ba7060, 0x2b97c70) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae internal/poll.(*pollDesc).waitRead(0xc42012a518, 0xc4207a2000, 0x8000, 0x8000) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(*FD).Read(0xc42012a500, 0xc4207a2000, 0x8000, 0x8000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a net.(*netFD).Read(0xc42012a500, 0xc4207a2000, 0x8000, 0x8000, 0x10, 0x10, 0x1acda40) /usr/local/go/src/net/fd_unix.go:202 +0x52 net.(*conn).Read(0xc420c4e7f8, 0xc4207a2000, 0x8000, 0x8000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:176 +0x6d bufio.(*Reader).Read(0xc42026e000, 0xc420158498, 0x9, 0x9, 0xaa47df, 0xc4202e29c8, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x30b io.ReadAtLeast(0x2b99b20, 0xc42026e000, 0xc420158498, 0x9, 0x9, 0x9, 0xc420181dc8, 0x20, 0xc42033d800) /usr/local/go/src/io/io.go:309 +0x86 io.ReadFull(0x2b99b20, 0xc42026e000, 0xc420158498, 0x9, 0x9, 0xc420181e09, 0xc42096b860, 0xc420c19440) /usr/local/go/src/io/io.go:327 +0x58 github.com/docker/docker/vendor/golang.org/x/net/http2.readFrameHeader(0xc420158498, 0x9, 0x9, 0x2b99b20, 0xc42026e000, 0x0, 0x0, 0xc42096b830, 0x0) /go/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:237 +0x7b github.com/docker/docker/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc420158460, 0xc, 0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:492 +0xa4 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*framer).readFrame(0xc4208203f0, 0xc42096b860, 0xc42096b860, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http_util.go:544 +0x2f github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*http2Client).reader(0xc42033dc80) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:1057 +0xc0 created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:250 +0xb8d goroutine 64 [chan receive]: github.com/docker/docker/daemon/stats.(*Collector).Run(0xc4202e1180) /go/src/github.com/docker/docker/daemon/stats/collector.go:60 +0x1f9 created by github.com/docker/docker/daemon.(*Daemon).newStatsCollector /go/src/github.com/docker/docker/daemon/stats_collector.go:24 +0x7f goroutine 65 [chan receive, 2 minutes]: github.com/docker/docker/daemon.(*Daemon).execCommandGC(0xc4203e6200) /go/src/github.com/docker/docker/daemon/exec.go:281 +0x158 created by github.com/docker/docker/daemon.NewDaemon /go/src/github.com/docker/docker/daemon/daemon.go:896 +0x285a goroutine 72 [select, 1 minutes]: github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).watchLoop(0xc4200e7100) /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/store.go:452 +0xf9 created by github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).startWatch /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/store.go:469 +0x10e goroutine 73 [select, 2 minutes]: github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/overlay.(*driver).peerOpRoutine(0xc4200e7200, 0x2bbc320, 0xc4202e16c0, 0xc4204c1020) /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/overlay/peerdb.go:278 +0x137 created by github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/overlay.Init /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/overlay/overlay.go:78 +0x235 goroutine 74 [IO wait, 2 minutes]: internal/poll.runtime_pollWait(0x7f46b9c6cbb0, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:173 +0x57 internal/poll.(*pollDesc).wait(0xc42012a018, 0x72, 0xc420027500, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae internal/poll.(*pollDesc).waitRead(0xc42012a018, 0xffffffffffffff00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(*FD).Accept(0xc42012a000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e2 net.(*netFD).accept(0xc42012a000, 0xc420062b38, 0xc420027708, 0x42b289) /usr/local/go/src/net/fd_unix.go:238 +0x42 net.(*UnixListener).accept(0xc42081d890, 0x45a77e, 0x1, 0xc4200277d0) /usr/local/go/src/net/unixsock_posix.go:162 +0x32 net.(*UnixListener).Accept(0xc42081d890, 0xfffffffe7bfe2b21, 0x20002, 0x20002, 0xc4200277cc) /usr/local/go/src/net/unixsock.go:241 +0x49 github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).acceptClientConnections(0xc4200e7100, 0xc420172000, 0x5c, 0x2bba420, 0xc42081d890) /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox_externalkey_unix.go:127 +0x3b created by github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).startExternalKeyListener /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox_externalkey_unix.go:121 +0x1e8 goroutine 79 [select, 2 minutes]: github.com/docker/docker/daemon.(*Daemon).ProcessClusterNotifications(0xc4203e6200, 0x2bbc320, 0xc420a0f280, 0xc42026fda0) /go/src/github.com/docker/docker/daemon/events.go:150 +0x115 created by main.(*DaemonCli).start /go/src/github.com/docker/docker/cmd/dockerd/daemon.go:299 +0x1a6a goroutine 80 [chan receive, 2 minutes]: main.(*DaemonCli).setupConfigReloadTrap.func1(0xc420a90900, 0xc420115e00) /go/src/github.com/docker/docker/cmd/dockerd/daemon_unix.go:66 +0x69 created by main.(*DaemonCli).setupConfigReloadTrap /go/src/github.com/docker/docker/cmd/dockerd/daemon_unix.go:65 +0xbf goroutine 81 [chan receive, 2 minutes]: github.com/docker/docker/api/server.(*Server).serveAPI(0xc42010e840, 0xc42020e798, 0xc42020e7b8) /go/src/github.com/docker/docker/api/server/server.go:94 +0x14d github.com/docker/docker/api/server.(*Server).Wait(0xc42010e840, 0xc420a35500) /go/src/github.com/docker/docker/api/server/server.go:199 +0x2f created by main.(*DaemonCli).start /go/src/github.com/docker/docker/cmd/dockerd/daemon.go:307 +0x1acf goroutine 83 [IO wait, 1 minutes]: internal/poll.runtime_pollWait(0x7f46b9c6cf70, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:173 +0x57 internal/poll.(*pollDesc).wait(0xc42012a718, 0x72, 0xc4204eec00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae internal/poll.(*pollDesc).waitRead(0xc42012a718, 0xffffffffffffff00, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(*FD).Accept(0xc42012a700, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e2 net.(*netFD).accept(0xc42012a700, 0x42bc26, 0x1de2d18, 0xc4204eee00) /usr/local/go/src/net/fd_unix.go:238 +0x42 net.(*UnixListener).accept(0xc420212090, 0xc4204eee08, 0x401e11, 0xc420a52300) /usr/local/go/src/net/unixsock_posix.go:162 +0x32 net.(*UnixListener).Accept(0xc420212090, 0xc4204eee40, 0x6dfd78, 0x45a310, 0xc4204eee80) /usr/local/go/src/net/unixsock.go:241 +0x49 github.com/docker/docker/cmd/dockerd/hack.(*MalformedHostHeaderOverride).Accept(0xc420118a80, 0x1de2508, 0xc420a52280, 0x2bbc3e0, 0xc420a75b00) /go/src/github.com/docker/docker/cmd/dockerd/hack/malformed_host_override.go:116 +0x37 net/http.(*Server).Serve(0xc4203da0d0, 0x2bae820, 0xc420118a80, 0x0, 0x0) /usr/local/go/src/net/http/server.go:2695 +0x1b2 github.com/docker/docker/api/server.(*HTTPServer).Serve(0xc4201163c0, 0x10, 0xc420209fb0) /go/src/github.com/docker/docker/api/server/server.go:112 +0x40 github.com/docker/docker/api/server.(*Server).serveAPI.func1(0xc420a90960, 0xc4201163c0) /go/src/github.com/docker/docker/api/server/server.go:86 +0xaa created by github.com/docker/docker/api/server.(*Server).serveAPI /go/src/github.com/docker/docker/api/server/server.go:83 +0x81 goroutine 4829122 [runnable]: sync.runtime_SemacquireMutex(0xc42015c9ec, 0xc4202c1800) /usr/local/go/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc42015c9e8) /usr/local/go/src/sync/mutex.go:134 +0xee sync.(*RWMutex).Lock(0xc42015c9e8) /usr/local/go/src/sync/rwmutex.go:93 +0x2d github.com/docker/docker/vendor/golang.org/x/net/trace.(*trace).Finish(0xc420aaeb60) /go/src/github.com/docker/docker/vendor/golang.org/x/net/trace/trace.go:376 +0x237 github.com/docker/docker/vendor/google.golang.org/grpc.newClientStream.func1(0xc4202c1c18, 0xc4211e6e40) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:150 +0xee github.com/docker/docker/vendor/google.golang.org/grpc.newClientStream(0x7f46b9c34798, 0xc4211e6ea0, 0x2b78580, 0xc420210820, 0x1d9f2e0, 0x2f, 0xc421878080, 0x1, 0x1, 0x0, ...) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:192 +0x5ff github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.stream(0x1d4761e, 0x4, 0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc420210820, 0x1d9f2e0, 0x2f, 0x1de2070, 0xc421878080, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:27 +0xcd github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.stream)-fm(0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc420210820, 0x1d9f2e0, 0x2f, 0x1de2070, 0xc421878080, 0x1, 0x1, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:34 +0xcd github.com/docker/docker/vendor/google.golang.org/grpc.NewClientStream(0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc420210820, 0x1d9f2e0, 0x2f, 0xc421878080, 0x1, 0x1, 0x448087, ...) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:103 +0xb3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1.(*eventsClient).Subscribe(0xc4213cc170, 0x7f46b9c3a470, 0xc420111740, 0xc420e45b20, 0xc421878080, 0x1, 0x1, 0xc420ef3530, 0x21, 0xc42020ce88, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1/events.pb.go:188 +0xb5 github.com/docker/docker/libcontainerd.(*client).processEventStream(0xc4201687e0, 0x2bbc320, 0xc420111740) /go/src/github.com/docker/docker/libcontainerd/client_daemon.go:716 +0x2eb created by github.com/docker/docker/libcontainerd.(*client).processEventStream.func1 /go/src/github.com/docker/docker/libcontainerd/client_daemon.go:711 +0x117 goroutine 4829113 [runnable]: fmt.(*fmt).truncate(0xc42155a280, 0xc420a4da40, 0x63, 0x411b80, 0x7f46bb4cb6c8) /usr/local/go/src/fmt/format.go:312 +0xe1 fmt.(*fmt).fmt_q(0xc42155a280, 0xc420a4da40, 0x63) /usr/local/go/src/fmt/format.go:411 +0x49 fmt.(*pp).fmtString(0xc42155a240, 0xc420a4da40, 0x63, 0x71) /usr/local/go/src/fmt/print.go:439 +0x7c fmt.(*pp).printArg(0xc42155a240, 0x19d8e00, 0xc42153adb0, 0x7f4600000071) /usr/local/go/src/fmt/print.go:664 +0x7b5 fmt.(*pp).doPrintf(0xc42155a240, 0x1d6fda3, 0x1b, 0xc4202bd650, 0x1, 0x1) /usr/local/go/src/fmt/print.go:996 +0x15a fmt.Sprintf(0x1d6fda3, 0x1b, 0xc4204f0650, 0x1, 0x1, 0x2033dc01, 0x7e2) /usr/local/go/src/fmt/print.go:196 +0x66 github.com/docker/docker/vendor/google.golang.org/grpc/transport.ConnectionError.Error(0xc420a4da40, 0x63, 0x0, 0x2b9f0e0, 0xc420ada370, 0xc4204f06c0, 0x2ba6160) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:578 +0x96 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*ConnectionError).Error(0xc420ade840, 0x1ddd4b0, 0xc42155a180) :1 +0x58 fmt.(*pp).handleMethods(0xc42155a180, 0x76, 0x1) /usr/local/go/src/fmt/print.go:590 +0x186 fmt.(*pp).printArg(0xc42155a180, 0x1c1c800, 0xc420ade840, 0x76) /usr/local/go/src/fmt/print.go:679 +0x171 fmt.(*pp).doPrintf(0xc42155a180, 0x1d4611c, 0x2, 0xc4202bdb40, 0x1, 0x1) /usr/local/go/src/fmt/print.go:996 +0x15a fmt.Sprintf(0x1d4611c, 0x2, 0xc4204f0b40, 0x1, 0x1, 0xc42001b300, 0x7f46b9c34798) /usr/local/go/src/fmt/print.go:196 +0x66 github.com/docker/docker/vendor/google.golang.org/grpc/status.Errorf(0xc40000000d, 0x1d4611c, 0x2, 0xc4204f0b40, 0x1, 0x1, 0x0, 0xffffffffffffffff) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/status/status.go:122 +0x60 github.com/docker/docker/vendor/google.golang.org/grpc.Errorf(0xc40000000d, 0x1d4611c, 0x2, 0xc4204f0b40, 0x1, 0x1, 0x0, 0x2ba6160) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:398 +0x5b github.com/docker/docker/vendor/google.golang.org/grpc.newClientStream(0x7f46b9c34798, 0xc42165a600, 0x2b78580, 0xc4202101a0, 0x1d9f2e0, 0x2f, 0xc42153ad70, 0x1, 0x1, 0x0, ...) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:192 +0x5cc github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.stream(0x1d541dd, 0xc, 0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc4202101a0, 0x1d9f2e0, 0x2f, 0x1de2070, 0xc42153ad70, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:27 +0xcd github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.stream)-fm(0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc4202101a0, 0x1d9f2e0, 0x2f, 0x1de2070, 0xc42153ad70, 0x1, 0x1, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:34 +0xcd github.com/docker/docker/vendor/google.golang.org/grpc.NewClientStream(0x7f46b9c3a470, 0xc420111740, 0x2b78580, 0xc4202101a0, 0x1d9f2e0, 0x2f, 0xc42153ad70, 0x1, 0x1, 0x448087, ...) /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:103 +0xb3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1.(*eventsClient).Subscribe(0xc420efc240, 0x7f46b9c3a470, 0xc420111740, 0xc4210e6b20, 0xc42153ad70, 0x1, 0x1, 0xc421656240, 0x29, 0x2, ...) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1/events.pb.go:188 +0xb5 github.com/docker/docker/libcontainerd.(*client).processEventStream(0xc420168380, 0x2bbc320, 0xc420111740) /go/src/github.com/docker/docker/libcontainerd/client_daemon.go:716 +0x2eb created by github.com/docker/docker/libcontainerd.(*client).processEventStream.func1 /go/src/github.com/docker/docker/libcontainerd/client_daemon.go:711 +0x117 ```

thaJeztah commented 6 years ago

Also CPU goes up after killing docker-containerd;

 PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   12     1 root     S     269m  13%   1  88% dockerd --debug --iptables=false
  106    12 root     S     278m  14%   1   0% docker-containerd --config /var/ru
   56     0 root     S     1584   0%   0   0% sh
    1     0 root     S     1576   0%   1   0% sh -c dockerd --debug --iptables=f
  129    56 root     R     1520   0%   0   0% top

phsiao commented 6 years ago

We are also seeing same/similar problem. Here is one example that I happen to still have logs for.

A process pushed the system into out-of-memory territory, and before OOM kill decided to kill the process that should be killed, docker-containerd got restarted.

Jan 29 17:45:19 3rtzbx1 dockerd: time="2018-01-29T17:45:09.352788352-05:00" level=info msg="killing and restarting containerd" module=libcontainerd pid=27069

the kernel oom-killer message comes after that line. I suspect containerd OOM and killed itself.

Regarding recovery, I can't recover system by sending SIGHUP to dockerd, it never work for me given the few instances I saw. Restarting docker sometimes worked, but in at least one case reboot is the only way to fix it.

olafbuitelaar commented 6 years ago

In addition, it appears docker seems to be in a loop, rebuilding the various networks. this could be a cause for the high CPU utilization. For us only kill -9 or a reboot resolves the issue. see for example attached log, which keeps cycling messages.log

cpuguy83 commented 6 years ago

ping @mlaventure I can track the CPU usage down to https://github.com/moby/moby/blob/master/libcontainerd/client_daemon.go#L706

I'm working tracing down where the client is used/should be refreshed... but some help may save a lot of time.

cpuguy83 commented 6 years ago

Basically, containerd exits, docker restarts it, but the clients processing the client is getting connection refused when trying to connect to containerd like they are using an old handle or something.

mlaventure commented 6 years ago

@cpuguy83 I think you're right. containerd deletes the old socket upon start I think that would make the old inode unreachable.

ATM, the remote client is only created once here: https://github.com/moby/moby/blob/master/libcontainerd/remote_daemon.go#L130

This either need to be put behind a lock so we can update all clients after containerd get restarted or we can just create a new ephemeral one everytime it is needed.

The second case may be easier since it'd also take care of cases where containerd is not started by the daemon. Not sure what the impact would be on performance though

raxkin commented 6 years ago

We are also having this problem with version 17.12.0-ce

Version of docker and containerd

Tested on Debian stretch;

docker-containerd --version
containerd github.com/containerd/containerd v1.0.0 89623f28b87a6004d4b785663257362d1658a729

Client:
 Version:   17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:    Wed Dec 27 20:09:54 2017
  OS/Arch:  linux/amd64
  Experimental: false

Containers: 15
 Running: 15
 Paused: 0
 Stopped: 0
Images: 53
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 89623f28b87a6004d4b785663257362d1658a729)
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-1-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 23.54GiB
Name: cameleon29
ID: SGKP:EZJJ:5R3A:4KHO:I3KS:JOG2:ZWCW:ZWHV:OJEK:CJSN:6IJC:W37E
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

jankeromnes commented 6 years ago

FYI we had a very similar problem on Ubuntu 16.04.3 LTS, but pkill -HUP dockerd (or killing -9 every process in ps aux | grep docker) didn't help.

Docker version:

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:09:53 2017
  OS/Arch:      linux/amd64
  Experimental: false

We had a Datadog container that showed unhealthy, so we tried to sudo docker kill it, but this hung for 10+ hours.

We then tried to restart the Docker daemon, but it wouldn't come back up again, because:

Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": dial unix:///var/run/docker/containerd/docker-containerd.s[...]

We tried many different things that didn't help, but in the end, this "solved" our problem:

sudo apt-get update -q && sudo apt-get upgrade -qy

Hope this can help someone else in distress who lands on these pages.

MitRandi commented 6 years ago

One very similar problem too. Maybe it will be helpful for you.

My environment details (update my testing env to latest edge build):

Containers: 16
 Running: 16
 Paused: 0
 Stopped: 0
Images: 12
Server Version: 18.02.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.17.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 3
Name: rbcentos.from.sh
ID: HQGR:BEDG:5NWR:H2SB:DE5Y:I2QP:XPFQ:374R:LAHE:HZCG:BRLF:MO6M
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: trueTotal Memory: 3.734GiB

Start from clean docker start & create container.

root     22420  2.0  1.5 984720 58992 ?        Ssl  17:28   0:12 /usr/bin/dockerd
root     22426  0.7  0.6 840912 25728 ?        Ssl  17:28   0:04  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
root     31926  0.0  0.0   7508  3188 ?        Sl   17:37   0:00      \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.l
root     31951  0.3  0.1 121232  4144 ?        Ss   17:37   0:00      |   \_ /usr/sbin/httpd -f /etc/httpd/apache-platform/httpd24-shared.conf

And after systemctl restart docker

root     31926  0.0  0.0   9972  3452 ?        Sl   17:37   0:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/mob
root     31951  0.1  0.1 121232  4144 ?        Ss   17:37   0:00  \_ /usr/sbin/httpd -f /etc/httpd/apache-platform/httpd24-shared.conf
root      5728  5.6  0.8 487924 33608 ?        Ssl  17:40   0:00 /usr/bin/dockerd
root      5734  2.0  0.5 378924 23104 ?        Ssl  17:40   0:00  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml

grep -i ppid /proc/31926/status
PPid:   1

But i have no problems with docker interaction (kill/rm/stop/start etc.) with such as containers.

killall -9 dockerd didn't help. Now i can fix only with recreate containers.

cpuguy83 commented 6 years ago

@MitRandi Thanks for the report. This is fixed in containerd 1.0.2 (currently in release candidate phase). Once this is released we can include it in a dockerd patch release.... this would be a problem for all versions of docker from 17.11 and up... but note the containerd patch would only be included in 17.12 and 18.03 (assuming the containerd patch is released soon).

sdubeyflexera commented 6 years ago

ps aux | grep docker
sudo kill pid_no

I killed every process one at a time Above two steps worked for me .

cberner commented 6 years ago

@cpuguy83 I see that containerd 1.0.2 was merged in master. Will it be released in 17.12.1?

cpuguy83 commented 6 years ago

@cberner Hopefully. Working on it anyway.

stoicskyline commented 6 years ago

Fixed my issue with a renegade container by restarting docker on the Preferences Reset page.

lox commented 6 years ago

@cpuguy83 I couldn't see mention of this issue in the 17.12.1 release notes, did it make it in?

thaJeztah commented 6 years ago

@lox looks like it’s described as;

Fix dockerd not being able to reconnect to containerd when it is restarted moby/moby#36173

lox commented 6 years ago

Thanks @thaJeztah!

cberner commented 6 years ago

@thaJeztah was it fixed in 17.12.1? This PR seems like it wasn't merged: https://github.com/docker/docker-ce/pull/434

thaJeztah commented 6 years ago

@cberner IIRC, containerd 1.0.2 adds some additional improvements, but https://github.com/moby/moby/pull/36173 was included in 17.12.1 (through https://github.com/docker/docker-ce/pull/417)

cberner commented 6 years ago

Ah got it, thanks!

jankeromnes commented 6 years ago

FYI I still have an unkillable container with 17.12.1-ce:

$ sudo docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

$ sudo docker ps -a
CONTAINER ID        IMAGE                                       COMMAND                  CREATED             STATUS                    PORTS                                                                                                                               NAMES
cd7d9365d53a        datadog/docker-dd-agent:latest              "/entrypoint.sh supe…"   3 weeks ago         Up 7 days (unhealthy)     8125/udp, 8126/tcp                                                                                                                  dd-agent

$ sudo docker kill cd7d9365d53a
# ... nothing happens for 8+ hours ...

Note: This issue happens with Datadog containers specifically, and was originally filed as https://github.com/DataDog/docker-dd-agent/issues/284

EDIT: Maybe it's a different bug, e.g. #35933.

cpuguy83 commented 6 years ago

@tgropper The problem is with Docker in this case, not containerd.

pmanthen commented 6 years ago

This happened after recently upgrading docker to 17.12.0-ce. I restarted docker and it started working fine. OS: MacOS 10.13.3

mkokho commented 6 years ago

Hello guys, does containerd have logs? I'm trying to figure out why it was not responding to calls from docker. Docker tried several times over 20 minutes but then killed it.

Some log lines are below. Docker server version is 17.12.0-ce.

time="2018-03-26T20:20:36.605022254+13:00" level=debug msg="daemon is not responding" binary=docker-containerd error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" module=libcontainerd
time="2018-03-26T20:22:15.993570366+13:00" level=info msg="killing and restarting containerd" module=libcontainerd pid=2942

cpuguy83 commented 6 years ago

@mkokho Upgrade to 17.12.1

Containerd logs are piped into the dockerd logs.

vardhmanandroid2015 commented 6 years ago

Upgrade your docker version to latest by following below mentioned commands then you should be fine... apt-get update apt-get remove docker docker-engine docker.io apt-get install docker-ce

moby / moby