moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.53k stars 18.63k forks source link

Docker stop does not seem to fall back to sigkill #22802

Open joedborg opened 8 years ago

joedborg commented 8 years ago

Output of docker version:

Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Wed Apr 27 00:57:43 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Wed Apr 27 00:57:43 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 88
Server Version: 1.11.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 4.2.5-300.fc23.x86_64
Operating System: Fedora 23 (Twenty Three)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 141.6 GiB
Name: [OMITTED]
ID: KLBG:YWJX:R5LM:RL6D:2SPA:O77S:TTHF:K4PI:GHOX:6P4Q:I62T:YG7B
Docker Root Dir: /ssd/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: [OMITTED]
Https Proxy: [OMITTED]
Registry: https://index.docker.io/v1/

Phyisical node.

Steps to reproduce the issue:

  1. Process inside the container falls into state D (uninterruptible sleep).
  2. docker stop the container; hangs indefinitely, even if -t 1 is specified.
  3. Using ps to find the container shim process, you can see it's in Sl state. kill -9ing this process works and releases the docker stop command. The container seems to shut down cleanly, as it can be started again without errors.

Describe the results you received: docker stop hangs and doesn't appear to use SIGKILL, or at least not on all necessary processes.

Describe the results you expected: docker stop to SIGKILL all related processes after grace period.

Additional information you deem important (e.g. issue happens only occasionally): Obviously, having processes within the container falling to state D is not nice and is nothing to do with Docker, it merely highlights the stop problem.

Interestingly, I never seem to get to line 47 in daemon/stop.go: "Failed to send signal %d to the process, force killing"

thaJeztah commented 8 years ago

Is there a way to reproduce this? I.e., how can I get a container to get into that state?

joedborg commented 8 years ago

Mine was happening due to NAS problems (the container was moving files about). I guess you could try to run something like this in a container: http://unix.stackexchange.com/questions/134888/simulate-an-unkillable-process-in-d-state

thaJeztah commented 8 years ago

You mention it never reached "Failed to send signal %d to the process, force killing"; do the daemon logs show anything useful? Wondering what path it took, that didn't return an error (making it skip that step)

joedborg commented 8 years ago

@thaJeztah, I never see them in the CLI, maybe I'm not meant to?

Here are the logs:

level=info msg="Container 8c79844156c37826289bee236fa2de5fc797acef17ec9a7a19d8ac90c0fd2b36 failed to exit within 10 seconds of signal 15 - using the force"

level=info msg="Container 8c79844156c3 failed to exit within 10 seconds of kill - trying direct SIGKILL"

leve=error msg="containerd: get exit status" error="containerd: process has not exited" id=8c79844156c37826289bee236fa2de5fc797acef17ec9a7a19d8ac90c0fd2b36 pid=init systemPid=57603
thaJeztah commented 8 years ago

thanks! ping @mlaventure @tonistiigi any ideas?

phemmer commented 8 years ago

Processes in uninterruptable sleep can't be killed. The log output shows the SIGKILL is being sent, but the process isn't responding to it as expected. Killing the parent might cause docker to think the container is exited, but I'm willing to bet that D process is still there. And as such, it's going to be holding a lot of the container stuff open (the namespaces), so killing the parent is not a good ideas as you'll leave all these orphaned resources about.

joedborg commented 8 years ago

@phemmer Yep, that's correct, they are still there (and will be until reboot). I'm wondering if killing the container and leaving the processes (with a warning) is preferable to just hanigng without any indication?

thaJeztah commented 8 years ago

Well, we shouldn't hang, but killing the container and leaving the processes sounds definitely worse, because then there's no indication at all that there's still container processes running.

phemmer commented 8 years ago

If this is NFS, you can use mount options such as soft and intr to allow killing your process. In my personal opinion, I would think that should be the recommended solution here. If the process won't exit, I don't think docker should abandon it.