moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.11k stars 18.58k forks source link

docker ps hangs #18013

Closed justin8 closed 7 years ago

justin8 commented 8 years ago

We recently upgraded a large number of servers from docker 1.4 to 1.8.3, and now several servers every week begin to hang when running docker ps. Requiring a reboot of the entire server to resolve; restarting the daemon alone does not resolve the issue.

docker version: Docker version 1.8.3, build f4bf5c7

docker info:
Docker version 1.8.3, build f4bf5c7
ubuntu@docker-gce-ae-230-prod:~$ docker info
Containers: 1663
Images: 1284
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 4610
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-c9
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 8
Total Memory: 51.11 GiB
Name: docker-gce-ae-230-prod
ID: V6UJ:OGDC:WFTX:C6JC:RV6T:7DQB:JEC5:GFXG:G7AS:AXQ5:6JXI:PNR5
Username: cloud9deploy
Registry: https://index.docker.io/v1/

uname -a: Linux XXXXXXX 4.2.0-c9 #1 SMP Wed Sep 30 16:14:37 UTC 2015 x86_64 GNU/Linux

Running on GCE.

GordonTheTurtle commented 8 years ago

Hi!

Please read this important information about creating issues.

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

This is an automated, informational response.

Thank you.

For more information about reporting issues, see https://github.com/docker/docker/blob/master/CONTRIBUTING.md#reporting-other-issues


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version: docker info: uname -a:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue: 1. 2. 3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

ENEEDMOREINFO

jwthomp commented 8 years ago

I am seeing this issue, but it has a fairly particular setup case.

docker version: 1.9.0

docker info: Containers: 4 Images: 71 Server Version: 1.9.0 Storage Driver: overlay Backing Filesystem: extfs Execution Driver: native-0.2 Logging Driver: json-file Kernel Version: 3.19.0-33-generic Operating System: Ubuntu 14.04.3 LTS CPUs: 1 Total Memory: 993.2 MiB Name: node1 ID: WW6W:Z4PO:BJLC:ZTYW:53ZC:ORQ5:NYVC:NOB7:ON7Z:LXJA:EYQY:T4EQ

uname -a: Linux node1 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Running on virtualbox.

Lists to reproduce the issue:

  1. Have a private repository setup.
  2. Have consul setup, which provides DNS for it's registered services.
  3. Have your private registries DNS come from consul
  4. Have dnsmasq setup to route the consul subdomain to the consul provided DNS
  5. Start up a container from an image on the private repository
  6. Stop but do not remove the container
  7. Stop consul so it is no longer handling DNS for it's records
  8. docker ps -a OR curl 127.0.0.1:2375/v1.21/containers/json?all=1 (127.0.0.1:2375 is the URL for the local docker daemon) will hang.

Here is an strace on the docker command:

connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, 23) = 0 <0.000276> futex(0x1cc78c0, FUTEX_WAKE, 1) = 1 <0.000140> futex(0x1cc7840, FUTEX_WAKE, 1) = 1 <0.000018> epoll_create1(EPOLL_CLOEXEC) = 4 <0.000008> epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=673395864, u64=139707369600152}}) = 0 <0.000005> getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0 <0.000004> getpeername(3, {sa_family=AF_LOCAL, sun_path="var/run/docker.sock"}, [22]) = 0 <0.000004> read(3, 0xc20851a000, 4096) = -1 EAGAIN (Resource temporarily unavailable) <0.000005> write(3, "GET /v1.21/containers/json?all=1"..., 114) = 114 <0.001607> epoll_wait(4, {{EPOLLOUT, {u32=673395864, u64=139707369600152}}}, 128, 0) = 1 <0.000005> epoll_wait(4,

[INSERT PAUSE HERE]

{{EPOLLIN|EPOLLOUT, {u32=673395864, u64=139707369600152}}}, 128, -1) = 1 <10.012139> futex(0x1cc78c0, FUTEX_WAKE, 1) = 1 <0.000015> read(3, "HTTP/1.1 200 OK\r\nContent-Type: a"..., 4096) = 2044 <0.000011> read(3, 0xc20851a000, 4096) = -1 EAGAIN (Resource temporarily unavailable) <0.000005> write(1, "CONTAINER ID", 12CONTAINER ID) = 12 <0.000006> write(1, " ", 8 ) = 8 <0.000005> write(1, "IMAGE", 5IMAGE) = 5 <0.000005>

... etc

I suspect the Docker daemon is timing out on dnsmasq trying to connect to the consul backend.

One additional point of information, is that if I get consul populated with the genepool.service.consul address, and then stop genepool (but keep the service registered in consul) then docker ps -a works just fine. This seems to further confirm that the issue is in a DNS lookup and attempt to get it resolved between dnsmasq and consul.

I doubt there is much of a fix here for what I am seeing if the docker daemon needs to resolve the private repositories address for some reason. However, I wanted to point out a bug case on the chance that this was what someone else was running into.

Please let me know if I can provide any additional information or try anything.

Cheers,

Jeff

kunalkushwaha commented 8 years ago

facing similar issue with v1.9.0 release with Consul. In my case, trying to setup swarm cluster with Consul as discovery backend.

Once, node is added, docker ps or docker ls commands hangs and timeout takes almost 15~20 min.

thaJeztah commented 8 years ago

Has this been improved in docker 1.9.1? I know some changes were made in this area recently (e.g. https://github.com/docker/docker/issues/17720).

yeasy commented 8 years ago

With docker 1.11.2, still find this problem.

thaJeztah commented 8 years ago

@yeasy do you have some more information about your case; what were the steps after which it hung, how many containers/images, what platform/distro and graph driver are you using?

yeasy commented 8 years ago

@thaJeztah Yes.

dist: ubuntu 14.04.1 kernel: 3.13.0-32-generic storage: aufs Docker: 1.11.2 containers: 10-20

scenarios: run docker exec for many times.

The happening is random, with low cpu/mem/disk usage.

Thanks!

Nomon commented 8 years ago

Happening for me also on 1.11.2, trying 1.12-rc2 next. Here are the system details where it happened:

ubuntu@ip-172-18-36-100:~$ uname -a
Linux ip-172-18-36-100 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ip-172-18-36-100:~$ sudo docker version
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64
ubuntu@ip-172-18-36-100:~$ sudo docker info
Containers: 17
 Running: 15
 Paused: 0
 Stopped: 2
Images: 29
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 179
 Dirperm1 Supported: false
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local rexray
 Network: bridge null host
Kernel Version: 3.13.0-88-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 60 GiB
Name: ip-172-18-36-100
ID: SZHJ:IV7S:DHVZ:77KV:KKY3:QDQM:6AHT:J4Y5:67PH:NXBH:CO3F:U2KN
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

full strace of docker ps https://gist.github.com/Nomon/8d515046db5ed33eb08f304d78a6a194

yeasy commented 8 years ago

I found under /run, docker creates lots of fd, while not release after the docker exec. Maybe this is the root cause.

cpuguy83 commented 8 years ago

@yeasy There was a bug in 1.11.0 that did this, should not be a problem now.

cpuguy83 commented 8 years ago

@Nomon Looks like you have a stacktrace of the docker client which is unfortunately not very helpful in this situation :(. If you could provide the output (from the daemon side logs) after sending SIGUSR1 to the docker daemon this will give us what we need to help debug.

Thanks!

Nomon commented 8 years ago

@cpuguy83 I terminated that instance already, once it happen again I will do it. Thanks.

thaJeztah commented 7 years ago

closing because there's no activity, but happy to reopen if you're still having this issue, and have more information

andyxning commented 7 years ago

In Docker 1.12.0, we also encounter this docker ps hang. OS Debian 8 with Kernel 3.16.7-ckt20-1+deb8u4

thaJeztah commented 7 years ago

@andyxning do you have the same issue on docker 1.12.3? Various bug fixes went into 1.12.1, 1.12.2 and 1.12.3. If you still run into this issue in the current version (1.12.3 at time of writing), please open a new issue and provide more information. Commenting on closed issues may result in your comment being overlooked.

andyxning commented 7 years ago

​Ok, i will try 1.12.3 later. However, did we find what is the actual reason for docker behave like this.​

Ning Xie

2016-11-21 20:45 GMT+08:00 Sebastiaan van Stijn notifications@github.com:

@andyxning https://github.com/andyxning do you have the same issue on docker 1.12.3? Various bug fixes went into 1.12.1, 1.12.2 and 1.12.3. If you still run into this issue in the current version (1.12.3 at time of writing), please open a new issue and provide more information. Commenting on closed issues may result in your comment being overlooked.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/docker/docker/issues/18013#issuecomment-261927330, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSxaNjTMO3-4sQXx6Uqs-GTEIsvYdVuks5rAZKGgaJpZM4Gi9lj .