Docker can hang indefinitely waiting for a nonexistant process to pull an image.

tfoote commented 9 years ago

Running docker pull will simply hang waiting for a non-existant process to download the repository.

root@ip-172-31-18-106:~# docker pull ubuntu:trusty
Repository ubuntu already being pulled by another client. Waiting.

This is the same behavior as #3115 however there is no other docker process running.

The list of running docker containers:

# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

See here for a full process tree: https://gist.github.com/tfoote/c8a30e569c911f1977e2

When this happens my process monitor fails the job after 120 minutes, which happens regularly.

An strace of the docker instance can be found here: https://gist.github.com/tfoote/1dc3905eb9c235cb5c53

it is stuck on an epoll_wait call.

Here's all the standard info.

root@ip-172-31-18-106:~# docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef

root@ip-172-31-18-106:~# docker -D info
Containers: 132
Images: 6667
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 6953
Execution Driver: native-0.2
Kernel Version: 3.13.0-44-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 4
Total Memory: 14.69 GiB
Name: ip-172-31-18-106
ID: SZWS:VD6O:CLP2:WRAM:KWIL:47HZ:HOEY:SR6R:ZOWR:E3HG:PS7P:TCZP
Debug mode (server): false
Debug mode (client): true
Fds: 27
Goroutines: 32
EventsListeners: 0
Init Path: /usr/bin/docker
Docker Root Dir: /var/lib/docker
WARNING: No swap limit support

root@ip-172-31-18-106:~# uname -a
Linux ip-172-31-18-106 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It's running on AWS.

I'm running an instance of the ROS buildfarm which can reproduce this bad state once every couple days when fully loaded running debian package builds at ~ 100% cpu load. This happens when we are preparing a major release.

I have not been able to isolate the cause in a smaller example, it has happened on multiple different repositories. Sometimes it's the official Ubuntu repository, sometimes it's our own custom repositories. We've tracked a few instances of this error recently here. When one repository is failing to pull, others work fine. All the repositories are hosted on the public docker hub.

Here's an example of one hanging while another passes.

root@ip-172-31-18-106:~# docker pull ubuntu:saucy
Pulling repository ubuntu
^Croot@ip-172-31-18-106:~# docker pull ubuntu:saucy^C
root@ip-172-31-18-106:~# docker pull osrf/ubuntu_32bit
Pulling repository osrf/ubuntu_32bit
FATA[0000] Tag latest not found in repository osrf/ubuntu_32bit 
root@ip-172-31-18-106:~# docker pull osrf/ubuntu_32bit:saucy
Pulling repository osrf/ubuntu_32bit
d6a6e4bd19d5: Download complete 
Status: Image is up to date for osrf/ubuntu_32bit:saucy

As determined in #3115 this can be fixed by restarting docker. However from that issue it is expected that this issue should not happen anymore. I think there has been a regression or we've found another edge case.

I will keep the machine online for a few days if anyone has suggestions on what I can run to debug the isse. Otherwise I'll have to wait for it to reoccur to be able to test any debugging.

jiacheo commented 9 years ago

seeing the same problem on osx 10.10.4 with version 1.8.2 when

docker pull tutum/mongodb

the download and extract threads seam hang

e118faab2e16: Extracting [============> ] 16.15 MB/65.77 MB 7e2c5c55ef2c: Layer already being pulled by another client. Waiting. e04c66a223c4: Layer already being pulled by another client. Waiting. fa81ed084842: Layer already being pulled by another client. Waiting. 2452a4a1d9d9: Layer already being pulled by another client. Waiting. 6f084c061e5c: Layer already being pulled by another client. Waiting. 181a99a4400e: Layer already being pulled by another client. Waiting. 0f1319cd5eb7: Layer already being pulled by another client. Waiting. e01c90021d82: Layer already being pulled by another client. Waiting. dd80a1aedb84: Layer already being pulled by another client. Waiting. af93b9e16bae: Layer already being pulled by another client. Waiting. 9ca13b1c4bcf: Layer already being pulled by another client. Waiting. 9ca13b1c4bcf: Layer already being pulled by another client. Waiting.

PatrickHuetter commented 9 years ago

Same problem here on mac os x yosemite with docker 1.8.2 .

Jam71 commented 9 years ago

Same problem, but even if restart mac doesn't solve the problem.

cpuguy83 commented 9 years ago

@Jam71 you'd have to restart the VM. Restarting the Mac would likely checkpoint the VM and then restore it, bring it back in the same state as before.

loadaverage commented 9 years ago

Same issue. But strange thing: c93054eacfab: Download complete c93054eacfab: Layer already being pulled by another client. Waiting. So, parallel downloading hangs all stuff in some way.

docker info
Containers: 1
Images: 18
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 21
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.18.20-aufs
Operating System: Gentoo/Linux
CPUs: 4
Total Memory: 7.694 GiB

docker version
Client:
 Version:      1.8.1
 API version:  1.20
 Go version:   go1.5.1
 Git commit:   d12ea79
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.1
 API version:  1.20
 Go version:   go1.5.1
 Git commit:   d12ea79
 Built:        
 OS/Arch:      linux/amd64