moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.37k stars 616 forks source link

on mac os docker service stuck on “preparing” state in swarm cluster #2344

Open DanielYuan2012 opened 7 years ago

DanielYuan2012 commented 7 years ago

ocker@myvm1:~$ docker version Client: Version: 17.06.0-ce API version: 1.30 Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:15:15 2017 OS/Arch: linux/amd64

Server: Version: 17.06.0-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:51:55 2017 OS/Arch: linux/amd64 Experimental: false

docker@myvm1:~$ docker stack services getstartedlab ID NAME MODE REPLICAS IMAGE PORTS 27dlqighq0gf getstartedlab_web replicated 0/2 danielyuan2017/awesomeimages:part1 *:8090->80/tcp

ocker@myvm1:~$ docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS hc5xgmmaqdma2k7d3xetuv6wu myvm2 Ready Active jhflr5686vfcx5wo3gx3zmhlh * myvm1 Ready Active Leader

docker stack ps getstartedlab ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS mpunjv611812 getstartedlab_web.2 danielyuan2017/awesomeimages:part1 myvm2 Running Preparing about an hour ago lvqqpzqlndsh getstartedlab_web.5 danielyuan2017/awesomeimages:part1 myvm1 Running Preparing 38 minutes ago

docker stack file:

version: "3" services: web: image: danielyuan2017/awesomeimages:part1 deploy: replicas: 2 resources: limits: cpus: "0.1" memory: 50M restart_policy: condition: on-failure ports:

DanielYuan2012 commented 7 years ago

docker log:

ime="2017-08-09T11:41:07.971123295Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:41:11.429153304Z" level=debug msg="myvm1-c934f09ef89a: Initiating bulk sync with node myvm2-f61cc5ca4c50" time="2017-08-09T11:41:11.429960044Z" level=debug msg="myvm1-c934f09ef89a: Initiating unsolicited bulk sync for networks [ts5egqi6gs2lyq03tdgz23auk jl27lrbobaxln5a92ir0zreni] with node myvm2-f61cc5ca4c50" time="2017-08-09T11:41:11.432566015Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37280" time="2017-08-09T11:41:11.432850267Z" level=debug msg="myvm1-c934f09ef89a: Bulk sync to node myvm2-f61cc5ca4c50 took 1.427212ms" time="2017-08-09T11:41:15.802286114Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37282" time="2017-08-09T11:41:15.802682579Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:41:31.288465951Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37284" time="2017-08-09T11:41:31.288830895Z" level=debug msg="myvm1-c934f09ef89a: Initiating bulk sync for networks [ts5egqi6gs2lyq03tdgz23auk jl27lrbobaxln5a92ir0zreni] with node myvm2-f61cc5ca4c50" time="2017-08-09T11:41:37.972425141Z" level=debug msg="memberlist: Initiating push/pull sync with: 192.168.99.101:7946" time="2017-08-09T11:41:37.973415119Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:41:41.428965549Z" level=debug msg="myvm1-c934f09ef89a: Initiating bulk sync with node myvm2-f61cc5ca4c50" time="2017-08-09T11:41:41.429012102Z" level=debug msg="myvm1-c934f09ef89a: Initiating unsolicited bulk sync for networks [ts5egqi6gs2lyq03tdgz23auk jl27lrbobaxln5a92ir0zreni] with node myvm2-f61cc5ca4c50" time="2017-08-09T11:41:41.430431573Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37286" time="2017-08-09T11:41:41.430835627Z" level=debug msg="myvm1-c934f09ef89a: Bulk sync to node myvm2-f61cc5ca4c50 took 1.218411ms" time="2017-08-09T11:41:45.803641515Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37288" time="2017-08-09T11:41:45.804308575Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:42:01.287551342Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37290" time="2017-08-09T11:42:01.287722139Z" level=debug msg="myvm1-c934f09ef89a: Initiating bulk sync for networks [ts5egqi6gs2lyq03tdgz23auk jl27lrbobaxln5a92ir0zreni] with node myvm2-f61cc5ca4c50" time="2017-08-09T11:42:07.974238769Z" level=debug msg="memberlist: Initiating push/pull sync with: 192.168.99.101:7946" time="2017-08-09T11:42:07.975279149Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:42:11.428887396Z" level=debug msg="myvm1-c934f09ef89a: Initiating bulk sync with node myvm2-f61cc5ca4c50" time="2017-08-09T11:42:11.428968166Z" level=debug msg="myvm1-c934f09ef89a: Initiating unsolicited bulk sync for networks [ts5egqi6gs2lyq03tdgz23auk jl27lrbobaxln5a92ir0zreni] with node myvm2-f61cc5ca4c50" time="2017-08-09T11:42:11.430720090Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37292" time="2017-08-09T11:42:11.431088834Z" level=debug msg="myvm1-c934f09ef89a: Bulk sync to node myvm2-f61cc5ca4c50 took 1.391553ms" time="2017-08-09T11:42:15.805125491Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37294" time="2017-08-09T11:42:15.805615176Z" level=info msg="Node join event for myvm2-f61cc5ca4c50/192.168.99.101" time="2017-08-09T11:42:31.286963201Z" level=debug msg="memberlist: Stream connection from=192.168.99.101:37296"

nishanttotla commented 7 years ago

@DanielYuan2012 can you post the output of docker service inspect for the two services you're running?

DanielYuan2012 commented 7 years ago

docker@myvm1:~$ docker service inspect getstartedlab_web [ { "ID": "27dlqighq0gfghl1hy96kr58r", "Version": { "Index": 110 }, "CreatedAt": "2017-08-09T08:44:47.06330789Z", "UpdatedAt": "2017-08-09T11:24:52.239405424Z", "Spec": { "Name": "getstartedlab_web", "Labels": { "com.docker.stack.image": "danielyuan2017/awesomeimages:part1", "com.docker.stack.namespace": "getstartedlab" }, "TaskTemplate": { "ContainerSpec": { "Image": "danielyuan2017/awesomeimages:part1@sha256:0e15f9edf099ffebbe9805a0e0109fc5995f1e5998b51ad7e7015bd98e9ce72a", "Labels": { "com.docker.stack.namespace": "getstartedlab" }, "Privileges": { "CredentialSpec": null, "SELinuxContext": null }, "StopGracePeriod": 10000000000, "DNSConfig": {} }, "Resources": { "Limits": { "NanoCPUs": 100000000, "MemoryBytes": 52428800 } }, "RestartPolicy": { "Condition": "on-failure", "Delay": 5000000000, "MaxAttempts": 0 }, "Placement": {}, "Networks": [ { "Target": "jl27lrbobaxln5a92ir0zreni", "Aliases": [ "web" ] } ], "ForceUpdate": 0, "Runtime": "container" }, "Mode": { "Replicated": { "Replicas": 5 } }, "UpdateConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "RollbackConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "EndpointSpec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 80, "PublishMode": "ingress" } ] } }, "PreviousSpec": { "Name": "getstartedlab_web", "Labels": { "com.docker.stack.image": "danielyuan2017/awesomeimages:part1", "com.docker.stack.namespace": "getstartedlab" }, "TaskTemplate": { "ContainerSpec": { "Image": "danielyuan2017/awesomeimages:part1@sha256:0e15f9edf099ffebbe9805a0e0109fc5995f1e5998b51ad7e7015bd98e9ce72a", "Labels": { "com.docker.stack.namespace": "getstartedlab" }, "Privileges": { "CredentialSpec": null, "SELinuxContext": null } }, "Resources": { "Limits": { "NanoCPUs": 100000000, "MemoryBytes": 52428800 } }, "RestartPolicy": { "Condition": "on-failure", "MaxAttempts": 0 }, "Placement": { "Platforms": [ { "Architecture": "amd64", "OS": "linux" } ] }, "Networks": [ { "Target": "jl27lrbobaxln5a92ir0zreni", "Aliases": [ "web" ] } ], "ForceUpdate": 0, "Runtime": "container" }, "Mode": { "Replicated": { "Replicas": 2 } }, "EndpointSpec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 8090, "PublishMode": "ingress" } ] } }, "Endpoint": { "Spec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 80, "PublishMode": "ingress" } ] }, "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 80, "PublishMode": "ingress" } ], "VirtualIPs": [ { "NetworkID": "ts5egqi6gs2lyq03tdgz23auk", "Addr": "10.255.0.4/16" }, { "NetworkID": "jl27lrbobaxln5a92ir0zreni", "Addr": "10.0.0.2/24" } ] }, "UpdateStatus": { "State": "paused", "StartedAt": "2017-08-09T11:24:35.98366367Z", "Message": "update paused due to failure or early termination of task xri05xxx4jxrv1zomo4xi43nv" } } ]

docker@myvm1:~$ docker service inspect getstartedlab_visualizer [ { "ID": "m79honecunt9yl3x8ri061z3d", "Version": { "Index": 117 }, "CreatedAt": "2017-08-09T11:24:57.246300248Z", "UpdatedAt": "2017-08-09T11:24:57.246874092Z", "Spec": { "Name": "getstartedlab_visualizer", "Labels": { "com.docker.stack.image": "dockersamples/visualizer:stable", "com.docker.stack.namespace": "getstartedlab" }, "TaskTemplate": { "ContainerSpec": { "Image": "dockersamples/visualizer:stable@sha256:bc680132f772cb44062795c514570db2f0b6f91063bc3afa2386edaaa0ef0b20", "Labels": { "com.docker.stack.namespace": "getstartedlab" }, "Privileges": { "CredentialSpec": null, "SELinuxContext": null }, "Mounts": [ { "Type": "bind", "Source": "/var/run/docker.sock", "Target": "/var/run/docker.sock" } ], "StopGracePeriod": 10000000000, "DNSConfig": {} }, "Resources": {}, "RestartPolicy": { "Condition": "any", "Delay": 5000000000, "MaxAttempts": 0 }, "Placement": { "Constraints": [ "node.role == manager" ], "Platforms": [ { "Architecture": "amd64", "OS": "linux" } ] }, "Networks": [ { "Target": "jl27lrbobaxln5a92ir0zreni", "Aliases": [ "visualizer" ] } ], "ForceUpdate": 0, "Runtime": "container" }, "Mode": { "Replicated": { "Replicas": 1 } }, "UpdateConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "RollbackConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "EndpointSpec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 8080, "PublishedPort": 8080, "PublishMode": "ingress" } ] } }, "Endpoint": { "Spec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 8080, "PublishedPort": 8080, "PublishMode": "ingress" } ] }, "Ports": [ { "Protocol": "tcp", "TargetPort": 8080, "PublishedPort": 8080, "PublishMode": "ingress" } ], "VirtualIPs": [ { "NetworkID": "ts5egqi6gs2lyq03tdgz23auk", "Addr": "10.255.0.10/16" }, { "NetworkID": "jl27lrbobaxln5a92ir0zreni", "Addr": "10.0.0.8/24" } ] } } ]

fangjian0423 commented 7 years ago

I also meet this problem, how to solve it? @nishanttotla

preparing all the time.

apanagiotou commented 7 years ago

Any update on this? I'm having the same issue

naffiq commented 6 years ago

Had this issue when our machine run out of space, maybe this might help to someone

abjinugu commented 6 years ago

agree with @naffiq "docker system prune" resolved it..

theccalderon commented 6 years ago

Having this issue and not being solved by docker system prune. Any other suggestions? Thanks!

onprema commented 6 years ago

@ccalderon911217 It took about 1 hour on my machine, but the containers finally started running.

theccalderon commented 6 years ago

@eightlimbed thank you very much, same happened to me, it took like 2 hours but then it ran successfully.

lixiaomeng8520 commented 6 years ago

It took long time, but finally start. Who can tell me why?

unclesaam commented 6 years ago

I had the same problem today. Every update of my image took a very long time. Deleting all the previous built images (list them using docker images seems to have fixed the issue. This can be done quickly with docker rmi $(docker images -q). Be careful, as this will delete ALL the built and downloaded images on your system.

prologic commented 5 years ago

I ran into an issue similar to the one described in this issue; but with a different root cause and solution.

Symptoms

I saw a service deployed with docker stack deploy ... stuck in "Preparing" state for longer than usuaul -- well infinitely.

Debugging

docker service ps nor docker service ps --no-trunc showed anything particularly interesting.

Neither the logs tail -f /var/log/docker.log (//I run RancherOS nodes//) showed anything interesting on either the master node or the candidate node for the service to be deployed.

On further inspection I noted that this service was using the yzlin/nfs volume drive plugin so I started looking at my NAS and its NFS logs.

(//I've lost the logs in my buffer sorry//)

Turns out the problem was that some poor reboots of my NAS which runs ZFS had left some broken mount points lying around and some of the ZFS file-systems were not mounting. This was throwing NFS daemon errors in response to the client (yzlin/nfs). -- This however was not being surfaced in any useful way -- Perhaps the author of the yzlin/nfs volume driver plugin could improve NFS client errors somewhat.

Solution

Clean up the broken ZFS file-sytems and mounts. Basically incrementally unmounting child file-systems, cleaning up the parent file-systems (//that should be empty//) and remounting everything zfs mount -a.

This then let the yzlin/nfs volume driver plugin do its thing without issue and problem solved.


Be sure to check/look at your plugins :)

beenhead commented 5 years ago

Saw this issue when time sync issue. Started ntpd and container started.

fredericrous commented 3 years ago

I just reproduced the issue on Docker Desktop 3.1.0 (MacOS 11.2). My container was stuck on "preparing" state. I solved this by restarting docker.

The last action I did before this error was to restart dnsmasq, I don't know if it can be linked. The particularity with the container itself is that it maps port with mode "host"