Closed tfoote closed 4 years ago
seeing the same problem on osx 10.10.4 with version 1.8.2 when
docker pull tutum/mongodb
the download and extract threads seam hang
e118faab2e16: Extracting [============> ] 16.15 MB/65.77 MB 7e2c5c55ef2c: Layer already being pulled by another client. Waiting. e04c66a223c4: Layer already being pulled by another client. Waiting. fa81ed084842: Layer already being pulled by another client. Waiting. 2452a4a1d9d9: Layer already being pulled by another client. Waiting. 6f084c061e5c: Layer already being pulled by another client. Waiting. 181a99a4400e: Layer already being pulled by another client. Waiting. 0f1319cd5eb7: Layer already being pulled by another client. Waiting. e01c90021d82: Layer already being pulled by another client. Waiting. dd80a1aedb84: Layer already being pulled by another client. Waiting. af93b9e16bae: Layer already being pulled by another client. Waiting. 9ca13b1c4bcf: Layer already being pulled by another client. Waiting. 9ca13b1c4bcf: Layer already being pulled by another client. Waiting.
Same problem here on mac os x yosemite with docker 1.8.2 .
Same problem, but even if restart mac doesn't solve the problem.
@Jam71 you'd have to restart the VM. Restarting the Mac would likely checkpoint the VM and then restore it, bring it back in the same state as before.
Same issue. But strange thing: c93054eacfab: Download complete c93054eacfab: Layer already being pulled by another client. Waiting. So, parallel downloading hangs all stuff in some way.
docker info
Containers: 1
Images: 18
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.18.20-aufs
Operating System: Gentoo/Linux
CPUs: 4
Total Memory: 7.694 GiB
docker version
Client:
Version: 1.8.1
API version: 1.20
Go version: go1.5.1
Git commit: d12ea79
Built:
OS/Arch: linux/amd64
Server:
Version: 1.8.1
API version: 1.20
Go version: go1.5.1
Git commit: d12ea79
Built:
OS/Arch: linux/amd64
Has there been any progress on this? I updated to from boot2docker to docker-machine and docker 1.8.2 and I am completely dead in the water now. My containers didn't survive the migration and I am unable to build any new ones.
@sjfloat Unfortunately 1.8.2 didn't include the fixes I originally thought it did. Again most of these issues are probably fixed on master, however it will be difficult to tell until lots of people are using it.
@cpuguy83 So I should pull the latest docker and build it?
@sjfloat https://master.dockerproject.org
OK, I found the executable. But now the API version doesn't match the server. How do I coordinate that?
@sjfloat The server is the thing you'd really need to update more than the client.
I'm using the latest docker-machine_darwin-amd64 from https://github.com/docker/machine/releases/ and docker-1.9.0-dev from https://master.dockerproject.org/. This combination gives me:
Error response from daemon: client is newer than server (client API version: 1.21, server API version: 1.20)
In case anyone else is stuck, the workaround I resorted to was:
After this, things seem to be working again.
Summary: fix for me on ubuntu 14.04 after upgrade to 1.8.2+reboot
Had this "git pull blocking" issue with 1.8.1 on Ubuntu 14.04. Upgraded to 1.8.2 and it initially looked better but still blocks
docker pull nginx:1.9
1.9: Pulling from library/nginx
8c00acfb0175: Pull complete
426ac73b867e: Pull complete
d6c6bbd63f57: Pull complete
4ac684e3f295: Pull complete
91391bd3c4d3: Pull complete
b4587525ed53: Pull complete
0240288f5187: Pull complete
28c109ec1572: Pull complete
063d51552dac: Pull complete
d8a70839d961: Verifying Checksum
ceab60537ad2: Download complete
ceab60537ad2: Layer already being pulled by another client. Waiting.
Pulling repository docker.io/library/nginx
The operation status of the hub seems fine, https://status.docker.com
I'm trying to pull two images, httpd:2.4 and nginx:1.9. After the upgrade 1.8.1 to 1.8.2 a few more layers were downloaded for both, but both block with 'Layer already being pulled by another client. Waiting'.
Then rebooted, and magically pulls now work fine :-)
I confirm this with 1.8.2, I can't even use it. Rebooting or restarting daemon didn't help at all. Everything goes well until the last message appears (Pulling repository docker.io...), than all progress just suddenly stops.
@tpiha it's not fixed yet in 1.8.2, see https://github.com/docker/docker/issues/12823#issuecomment-144731719
@cpuguy83 is this the fix you were referring to? https://github.com/docker/docker/pull/15728
This is fixed in 1.9.0-dev for me.
+1 worked with 1.9.0-dev, and it says:
docker.io/<image>: this image was pulled from a legacy registry. Important: This registry version will not be supported in future versions of docker.
tested with 1.9-dev on boot2docker. behavior is different now, but still does not work.
If the network connection fails the pull gets stuck.
If you restart the client pull (after Ctrl-C it), you get the exact frozen state you had before.
Trying to remove one of the (sub)images already pulled gives me:
$ docker rmi 6bf4b72b9674 Error response from daemon: conflict: unable to delete 6bf4b72b9674 (cannot be forced) - image is held by an ongoing pull or build Error: failed to remove images: [6bf4b72b9674]
I have no idea how to restart that pull/trigger the download again without restarting the docker daemon and breaking all the other containers.
This behaviour can also be present when HD are full.
No more information are done than Layer already being pulled by another client. Waiting.
Perhaps a verbose information for this case will be usefull
Another issue for this problem: #15603 was closed 3 days ago
I'm also experiencing this issue while using tututm to deploy my docker containers. Redeploying doesn't work anymore from one node because it hangs on the pull. Is there anyway to kill whatever request/lock that's blocking further pulls?
USER POLL
The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.
The people listed below have appreciated your meaningfull discussion with a random +1:
@antmanler
ping @aaronlehmann can you have a look at this one?
I expected this would be fixed in 1.9.0, and the feedback above seems to confirm that so far. Is anyone still experiencing hangs with 1.9?
Is anyone still experiencing hangs with 1.9?
I'm experiencing the hang with 1.9.1. This was while following the whalesay tutorial. It hangs on--
ded5e192a685: Download complete
ping @aaronlehmann ^^
@cjerdonek is there anything useful in he daemon logs that could provide more information?
@thaJeztah Below is all I could find. You can see I tried it a number of times. It may have had to do with the fact that I was using wifi in a cafe. I did have internet access at the time it was happening. It worked once I restarted my computer as suggested above.
time="2015-11-23T00:39:58.613879148Z" level=error msg="Handler for POST /v1.21/containers/create returned error: No such image: docker/whalesay:latest"
time="2015-11-23T00:39:58.613898169Z" level=error msg="HTTP Error" err="No such image: docker/whalesay:latest" statusCode=404
time="2015-11-23T00:39:58.614426558Z" level=debug msg="Calling POST /v1.21/images/create"
time="2015-11-23T00:39:58.614447357Z" level=info msg="POST /v1.21/images/create?fromImage=docker%2Fwhalesay&tag=latest"
time="2015-11-23T00:39:58.614492542Z" level=debug msg="Trying to pull docker/whalesay from https://registry-1.docker.io v2"
time="2015-11-23T00:47:13.165634242Z" level=debug msg="Calling POST /v1.21/containers/create"
time="2015-11-23T00:47:13.165682012Z" level=info msg="POST /v1.21/containers/create"
time="2015-11-23T00:47:13.166109395Z" level=error msg="Handler for POST /v1.21/containers/create returned error: No such image: docker/whalesay:latest"
time="2015-11-23T00:47:13.166128106Z" level=error msg="HTTP Error" err="No such image: docker/whalesay:latest" statusCode=404
time="2015-11-23T00:47:13.166699068Z" level=debug msg="Calling POST /v1.21/images/create"
time="2015-11-23T00:47:13.166719220Z" level=info msg="POST /v1.21/images/create?fromImage=docker%2Fwhalesay&tag=latest"
time="2015-11-23T00:47:13.166765252Z" level=debug msg="Trying to pull docker/whalesay from https://registry-1.docker.io v2"
time="2015-11-23T00:48:48.514103170Z" level=debug msg="Calling GET /v1.21/version"
time="2015-11-23T00:48:48.514146155Z" level=info msg="GET /v1.21/version"
time="2015-11-23T00:49:34.563300021Z" level=debug msg="Calling POST /v1.21/containers/create"
time="2015-11-23T00:49:34.563340093Z" level=info msg="POST /v1.21/containers/create"
time="2015-11-23T00:49:34.563859819Z" level=error msg="Handler for POST /v1.21/containers/create returned error: No such image: docker/whalesay:latest"
time="2015-11-23T00:49:34.563878643Z" level=error msg="HTTP Error" err="No such image: docker/whalesay:latest" statusCode=404
time="2015-11-23T00:49:34.564373300Z" level=debug msg="Calling POST /v1.21/images/create"
time="2015-11-23T00:49:34.564394115Z" level=info msg="POST /v1.21/images/create?fromImage=docker%2Fwhalesay&tag=latest"
time="2015-11-23T00:49:34.564438361Z" level=debug msg="Trying to pull docker/whalesay from https://registry-1.docker.io v2"
When you run docker pull
and a matching pull is already running, it attaches to that pull instead of starting a new one. If the network is slow or unreliable, a download could get stuck, though I would still expect it to time out eventually. That may be what happened above.
The issue that most posts are discussing was a bug in 1.8.x where concurrent download handling would cause frequent hangs. I believe this specific bug has been fixed.
We're working on making downloading and uploading more robust for 1.10, with better support for cancelling transfers and automatically retrying failed transfers.
Fix has been reverted in #19971
I'm not sure if this is still really a problem as such though, since you can cancel the pull and start again.
This is a problem for automation, orchestration, and CI where detecting hangs and re-executing the last command is not a reasonable requirement.
@Kindrat you stated that
We've fixed this issue pulling images consequentially instead of parallel requests. So, it's worth to check thread management logic.
Can you tell me how you forced docker to pull the images in sequential order? I found configuration option to do so.
Saw that the https://github.com/docker/docker/commit/84b2162c1a15256ac09396ad0d75686ea468f40c commit was reverted. Is this issue actively being worked on? Anything I can do to help?
Same problem here. This is triggered by network reconfigure of host. "Fixed" via "docker-machine restart"
docker pull just hangs. sometimes.
docker version
Client:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 21:49:11 2016
OS/Arch: darwin/amd64
Server:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 23:54:00 2016
OS/Arch: linux/amd64
$ docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 17
Server Version: 1.12.0
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 78
Dirperm1 Supported: true
Logging Driver: json-file
Plugins:
Volume: local
Network: bridge null host overlay
Kernel Version: 4.4.16-boot2docker
Operating System: Boot2Docker 1.12.0 (TCL 7.2); HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.858 GiB
Name:dev
ID: 7W5K:LCIY:7RIQ:NNKL:PJOX:BTW4:BKBC:JH62:GRV5:GRV4:2MRM:E6N2
Debug mode (server): true
File Descriptors: 23
Goroutines: 54
System Time: 2016-08-12T10:31:27.62959682Z
EventsListeners: 0
Init SHA1:
Init Path:
Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
provider=virtualbox
I've had the same issue just today:
$ docker info
Containers: 3
Running: 0
Paused: 0
Stopped: 3
Images: 112
Server Version: 17.03.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.12-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952 GiB
Name: moby
ID: N5U6:HSP6:YNBL:CH7F:ZVHZ:JJVU:7H7W:WRX6:VGHV:A2E5:B5GW:EC4M
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 24
Goroutines: 58
System Time: 2017-03-14T17:25:04.829649043Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Confirmed that stop/restart of the docker daemon fixed it - and when I restarted the docker run - I noticed that there were MORE images in the pull list this time. Same as the 1st time I tried to run, but my corporate firewall blocked the pull. Every pull for the container since then was hanging forever until restart.
$ docker run -v `pwd`:/workshop -p 0.0.0.0:6006:6006 -p 0.0.0.0:8888:8888 \
> -it tensorflow/tensorflow bash
Unable to find image 'tensorflow/tensorflow:latest' locally
latest: Pulling from tensorflow/tensorflow
30d541b48fc0: Pulling fs layer
8ecd7f80d390: Pulling fs layer
46ec9927bb81: Pulling fs layer
2e67a4d67b44: Waiting
7d9dd9155488: Waiting
a27df5e99dc2: Waiting
88fd9b7642d8: Waiting
d13154bfa8c5: Waiting
af7499d4d2e2: Waiting
e905ca2659f3: Waiting
b018128f6a21: Waiting
74afe00108f1: Waiting
docker: error pulling image configuration: Get https://dseasb33srnrn.cloudfront.net/registry-v2/docker/registry/v2/blobs/sha256/34/348946c5276183058a26e7e6c4136ecd847dff11c8173d7db8946eca2077b604/data?Expires=1489512717&Signature=Updq-Bb~CjDn~pKG2CGIkj~mQ1DMZX4PIyXqL5QpVN-Fr1OTuzcep0bkWSqaXrieX0p~644RRiy07ioHx3fwl0YEHHcPouA1w4ku6X766Mf-pAZXbk15LSWT-Y-KMMOroyXSs6qZHFPtq03IBXAyGX3yacVdwW7Ezr4lHArjRB8_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q: dial tcp: lookup dseasb33srnrn.cloudfront.net on 192.168.65.1:53: read udp 192.168.65.2:52404->192.168.65.1:53: i/o timeout.
See 'docker run --help'.
I disconnected the corporate VPN and went direct to internet here:
$ docker run -v `pwd`:/workshop -p 0.0.0.0:6006:6006 -p 0.0.0.0:8888:8888 -it tensorflow/tensorflow bash
Unable to find image 'tensorflow/tensorflow:latest' locally
latest: Pulling from tensorflow/tensorflow
30d541b48fc0: Pull complete
8ecd7f80d390: Pull complete
46ec9927bb81: Pull complete
2e67a4d67b44: Pull complete
7d9dd9155488: Pull complete
a27df5e99dc2: Pull complete
88fd9b7642d8: Pull complete
d13154bfa8c5: Waiting
^C
I then restarted docker, and tried again:
$ docker run -v `pwd`:/workshop -p 0.0.0.0:6006:6006 -p 0.0.0.0:8888:8888 -it tensorflow/tensorflow bash
Unable to find image 'tensorflow/tensorflow:latest' locally
latest: Pulling from tensorflow/tensorflow
30d541b48fc0: Already exists
8ecd7f80d390: Already exists
46ec9927bb81: Already exists
2e67a4d67b44: Already exists
7d9dd9155488: Already exists
a27df5e99dc2: Already exists
88fd9b7642d8: Already exists
d13154bfa8c5: Downloading [=> ] 4.325 MB/113.9 MB
af7499d4d2e2: Downloading [=======> ] 7.11 MB/49.25 MB
e905ca2659f3: Download complete
b018128f6a21: Download complete
74afe00108f1: Download complete
As you can see - post restart, the list of images required went back to the original. It seems like it's waiting for a download it "thinks" is running but forgets to restart after a broken / failed download.
HTH
K
I'm seeing this happen on AWS / ECS - we do a docker pull and for some reason the network connection drops. Then our deploy is stuck since the pull hangs indefinitely.
Same issue i am facing the restart of docker as well as of the system doesn't help x86_64-1.0.0: Pulling from hyperledger/fabric-orderer
fe6b5e13de: Downloading [===========> ] 10.46MB/46.79MB
0a2b43a72660: Download complete
18bdd1e546d2: Download complete
8198342c3e05: Download complete
f56970a44fd4: Download complete
e32b597e7839: Download complete
a6e362fc71c4: Downloading [===========> ] 3.964MB/17.48MB
f107ea6d90f4: Download complete
593ba12c6c43: Downloading [===============================> ] 3.997MB/6.394MB
12b8c0ba3585: Waiting
OS Ubuntu 16.04
docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 3
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-81-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.797GiB
Name: euca-10-254-230-147
ID: 4WYS:DDPU:AQZQ:MVDK:WBJ7:62OI:LRZH:KCWS:W2OA:PFTK:7JDH:ZAR3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: 10.158.100.6:8080
Https Proxy: 10.158.100.6:8080
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
@Katiyman anything in the daemon logs?
@thaJeztah nope nothing of value.. but i changed the proxy and it got through.. still not got the root cause..the earlier proxy was also a working proxy and why it dint work not sure..
The code path has significantly changed since v1.5. If somebody else is still hitting, please open a new issue.
Running docker pull will simply hang waiting for a non-existant process to download the repository.
This is the same behavior as #3115 however there is no other docker process running.
The list of running docker containers:
See here for a full process tree: https://gist.github.com/tfoote/c8a30e569c911f1977e2
When this happens my process monitor fails the job after 120 minutes, which happens regularly.
An strace of the docker instance can be found here: https://gist.github.com/tfoote/1dc3905eb9c235cb5c53
it is stuck on an epoll_wait call.
Here's all the standard info.
It's running on AWS.
I'm running an instance of the ROS buildfarm which can reproduce this bad state once every couple days when fully loaded running debian package builds at ~ 100% cpu load. This happens when we are preparing a major release.
I have not been able to isolate the cause in a smaller example, it has happened on multiple different repositories. Sometimes it's the official Ubuntu repository, sometimes it's our own custom repositories. We've tracked a few instances of this error recently here. When one repository is failing to pull, others work fine. All the repositories are hosted on the public docker hub.
Here's an example of one hanging while another passes.
As determined in #3115 this can be fixed by restarting docker. However from that issue it is expected that this issue should not happen anymore. I think there has been a regression or we've found another edge case.
I will keep the machine online for a few days if anyone has suggestions on what I can run to debug the isse. Otherwise I'll have to wait for it to reoccur to be able to test any debugging.