moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.34k stars 612 forks source link

docker service update constraint-add <svc> issue #1915

Open Deepak-Vohra opened 7 years ago

Deepak-Vohra commented 7 years ago

With a Swarm cluster consisting of a master and 2 worker nodes constraints added to place a service replicas on both manager and worker get added. How could a service replica be allocated to a node if constraints are added to place only on manager and only on worker. As an example:

  1. Start with 2 replicas for a service 'mysql' without any constraints. Replicas are allocated without any constraint.
core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql
ID                         NAME     IMAGE         NODE                        DESIRED STATE  CURRENT STATE          ERROR
bd4aw5mxijamjk94emr7uokd0  mysql.1  mysql:latest  ip-10-0-0-58.ec2.internal   Running        Running 8 minutes ago  
aq92faql778zbepzl7gldktne  mysql.2  mysql:latest  ip-10-0-0-140.ec2.internal  Running        Running 8 minutes ago  
  1. Scale service to 3 replicas.

  2. Add constraint to only place replicas on 'manager'. core@ip-10-0-0-238 ~ $ docker service update --constraint-add 'node.role==manager' mysql ` All service replicas get placed on 'manager'.

    
    core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql
    ID                         NAME     IMAGE         NODE                        DESIRED STATE  CURRENT STATE           ERROR
    a8sxircmcl0068owwb14719yu  mysql.1  mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running 10 seconds ago  
    al4cnixheuy7ww07w2b7hudfc  mysql.2  mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running 36 seconds ago  
    8y3lm96begonntdr2of2104kl  mysql.3  mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running 23 seconds ago  
4. Scale to 10 replicas and all replicas are on 'manager' node as expected.

core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR a8sxircmcl0068owwb14719yu mysql.1 mysql:latest ip-10-0-0-238.ec2.internal Running Running 2 minutes ago
al4cnixheuy7ww07w2b7hudfc mysql.2 mysql:latest ip-10-0-0-238.ec2.internal Running Running 2 minutes ago
8y3lm96begonntdr2of2104kl mysql.3 mysql:latest ip-10-0-0-238.ec2.internal Running Running 2 minutes ago
8wvcmcap3ra2f6fd7lk9w29nb mysql.6 mysql:latest ip-10-0-0-238.ec2.internal Running Running 20 seconds ago
4m7bzl1ra6km6mabb8bhm4e1t mysql.7 mysql:latest ip-10-0-0-238.ec2.internal Running Running 12 seconds ago
56dlt2jmhi91cc5kum4f9thwi mysql.8 mysql:latest ip-10-0-0-238.ec2.internal Running Running 12 seconds ago
6ha3b20l2dlufk659htbiwqas mysql.9 mysql:latest ip-10-0-0-238.ec2.internal Running Running 12 seconds ago
c4ddz2aw9jfued1665zou312m mysql.10 mysql:latest ip-10-0-0-238.ec2.internal Running Running 21 seconds ago

5. Add constraint to place service replicas on worker.

`docker service update --constraint-add 'node.role==worker' mysql`

The result should be that no replica should be running as a replica cannot be both on the 'manager' and 'worker' nodes. But the result is that some of the replicas are listed as "Allocated" but without any node on which placed and some of the replicas are still running on 'manager'. 

core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 2vrcbtpn5bz3r86rlj2gneffm mysql.1 mysql:latest Running Allocated 3 minutes ago
al4cnixheuy7ww07w2b7hudfc mysql.2 mysql:latest ip-10-0-0-238.ec2.internal Running Running 6 minutes ago
8y3lm96begonntdr2of2104kl mysql.3 mysql:latest ip-10-0-0-238.ec2.internal Running Running 6 minutes ago
3u8yt7oqe6kgxie3pjx54bdgr mysql.4 mysql:latest Running Allocated 3 minutes ago
5e1okgodw0zktn2hilmhr5qa4 mysql.5 mysql:latest Running Allocated 3 minutes ago
47xgfvmzcskomx67n0ji1wfor mysql.6 mysql:latest Running Allocated 3 minutes ago
7ziykb2o0p73hdxc8ie4pu8e2 mysql.7 mysql:latest Running Allocated 3 minutes ago
1to2xdaw3zv6qlr2j60fn1s76 mysql.8 mysql:latest Running Allocated 2 minutes ago
cijq7pg3l6kvvp4mdcmuc0ci5 mysql.9 mysql:latest Running Allocated 2 minutes ago
a94v71crwlv60mt6giocg0tv9 mysql.10 mysql:latest Running Allocated 3 minutes ago
core@ip-10-0-0-238 ~ $

Assuming the other two replicas would also have shutdown and listed as 'Allocated" after a few more minutes, if both the constraints are removed, all replicas get placed and are Running distributed across the nodes in the cluster. 

core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 34rflltueyekp4httotm7x5ur mysql.1 mysql:latest ip-10-0-0-140.ec2.internal Running Running 7 minutes ago
7defu1n1vgkp9rwzogwjwqvpf mysql.2 mysql:latest ip-10-0-0-238.ec2.internal Running Running 44 seconds ago
58mzqk4zof6x77jsa7v8h23w4 mysql.3 mysql:latest ip-10-0-0-140.ec2.internal Running Running 2 minutes ago
da5y55an9la5ah0l2ounimh2n mysql.4 mysql:latest ip-10-0-0-140.ec2.internal Running Running 5 minutes ago
clao5lkper1faksuo8uk24dje mysql.5 mysql:latest ip-10-0-0-58.ec2.internal Running Running 3 minutes ago
dk6sc736kprvphexbmgpqjeso mysql.6 mysql:latest ip-10-0-0-58.ec2.internal Running Running 4 minutes ago
ecnycghcn0tbj8mn1252tvrrz mysql.7 mysql:latest ip-10-0-0-238.ec2.internal Running Running 2 minutes ago
0bbsfgxzkmrh23avr6o35s4qz mysql.8 mysql:latest ip-10-0-0-58.ec2.internal Running Running 8 minutes ago
bpgkc3fxsl4ecm9ezq9q17aii mysql.10 mysql:latest ip-10-0-0-238.ec2.internal Running Running 58 seconds ago



On which node are the service replicas "Allocated" as listed but not Running if the only nodes are the manager and the worker role nodes?
dongluochen commented 7 years ago

On which node are the service replicas "Allocated" as listed but not Running

@dvohra when a task from a replicated service is in Allocated state, it has not been Assigned to any node. So the NODE field in docker service ps shows empty. These tasks are in the hand of manager, not at any node.

What's your UpdateConfig from docker service inspect mysql --pretty? I see the update continues when the tasks stuck at Allocated. It seems you allow update to continue disregard of update is success or not.

Deepak-Vohra commented 7 years ago

With both constraints added the result is not consistent. Earlier all but 2 of the 10 replicas stopped running within 5 minutes. When tested again, all but two replicas are running even after more than 10 minutes, started with 10 replicas.


core@ip-10-0-0-238 ~ $ docker service ps -f desired-state=running mysql
ID                         NAME      IMAGE         NODE                        DESIRED STATE  CURRENT STATE              ERROR
314xi0et7v7q306dxa2dav8kr  mysql.1   mysql:latest                              Running        Allocated 9 minutes ago    
4gtdsi8n5lb42tzedm1ltiyb6  mysql.2   mysql:latest                              Running        Allocated 8 minutes ago    
csjvs4ci8km86fewrq723nkf3  mysql.3   mysql:latest  ip-10-0-0-140.ec2.internal  Running        Running about an hour ago  
da5y55an9la5ah0l2ounimh2n  mysql.4   mysql:latest  ip-10-0-0-140.ec2.internal  Running        Running 2 hours ago        
6g4v90aoo9r0itzzu7b7nvoyt  mysql.5   mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running about an hour ago  
dk6sc736kprvphexbmgpqjeso  mysql.6   mysql:latest  ip-10-0-0-58.ec2.internal   Running        Running about an hour ago  
ecnycghcn0tbj8mn1252tvrrz  mysql.7   mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running about an hour ago  
bsxi3pqwhnjcdkofkgtqmif69  mysql.8   mysql:latest  ip-10-0-0-58.ec2.internal   Running        Running about an hour ago  
e07fajyd1ogqvin1o8cg1v9tq  mysql.9   mysql:latest  ip-10-0-0-58.ec2.internal   Running        Running about an hour ago  
7ub4vw46afmgn5suvm9wcma6y  mysql.10  mysql:latest  ip-10-0-0-238.ec2.internal  Running        Running about an hour ago  
core@ip-10-0-0-238 ~ $ docker service inspect mysql
[
    {
        "ID": "2jvr9t2tl7ovxa6o5zt7jhh8m",
        "Version": {
            "Index": 2320
        },
        "CreatedAt": "2017-02-01T18:10:28.490900302Z",
        "UpdatedAt": "2017-02-01T21:30:48.294171833Z",
        "Spec": {
            "Name": "mysql",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "mysql:latest",
                    "Env": [
                        "MYSQL_ROOT_PASSWORD=mysql"
                    ]
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.role==worker",
                        "node.role==manager"
                    ]
                }
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 10
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "Delay": 10000000000,
                "FailureAction": "pause"
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {}
        },
        "UpdateStatus": {
            "State": "updating",
            "StartedAt": "2017-02-01T21:30:48.294165833Z",
            "CompletedAt": "1970-01-01T00:00:00Z",
            "Message": "update in progress"
        }
    }
]
core@ip-10-0-0-238 ~ $ 
Deepak-Vohra commented 7 years ago

Still, all but two replicas are running even after more than 20 minutes. How could replicas be running with the placement constraints not being applied?

Constraints": [
                        "node.role==worker",
                        "node.role==manager"
                    ]

Is some AND, OR used for placement with multiple constraints?

dongluochen commented 7 years ago

It's because update has not finished.

        "UpdateStatus": {
            "State": "updating",
            "StartedAt": "2017-02-01T21:30:48.294165833Z",
            "CompletedAt": "1970-01-01T00:00:00Z",
            "Message": "update in progress"
        }

I think it better to fail this update with timeout mechanism instead of letting it stuck at "updating". What's docker version on your nodes?

cc @aaronlehmann.

Deepak-Vohra commented 7 years ago
core@ip-10-0-0-238 ~ $ docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   d5236f0
 Built:        Tue Jan 31 07:56:17 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   d5236f0
 Built:        Tue Jan 31 07:56:17 2017
 OS/Arch:      linux/amd64

How are multiple constraints applied?

dongluochen commented 7 years ago

In my test on current Docker master (1.14.0-dev), the update only takes down Parallelism (default to 1) replicas, which protects the service from failing.

Multiple constraints apply as "&&". In your case it is "node.role==worker && node.role==manager". We might extend it later though.

Deepak-Vohra commented 7 years ago

As role cannot be both manager and worker replicas should not be running with both roles applied as constraints.

dongluochen commented 7 years ago

@dvohra that's correct. If you put these constraints on service create, no instance would advance to running.

Deepak-Vohra commented 7 years ago

With Parallelism as 1 two replicas would be made non running. But in an earlier run all but two (8 out of 10) replicas were made non running.

Deepak-Vohra commented 7 years ago

Shouldn't update be the same. At least have some consistency.

Deepak-Vohra commented 7 years ago

If you put these constraints on service create, no instance would advance to running.

Got the result indicated. But stays Allocated. No provision to set timeout.

aaronlehmann commented 7 years ago

I suspect this may be fixed in 1.13.0 by https://github.com/docker/swarmkit/pull/1612

aaronlehmann commented 7 years ago

Actually that's probably not the case - that PR is about handling updates to nodes that cause constraints to no longer be met.

Deepak-Vohra commented 7 years ago

The issue is different, about service constraints, which are node role based.

aaronlehmann commented 7 years ago

See also https://github.com/docker/swarmkit/issues/1720, which has some related discussion about how updates should behave when they can't move forward.

dongluochen commented 7 years ago

Shouldn't update be the same. At least have some consistency.

Update is different from initial service create. The reason is update should try to protect service from failure. In the initial service create, there is nothing to protect.

Orchestrator is responsible to create X tasks according to service specification. Scheduler tries to find nodes to host the tasks. If not any node can pass the constraints, the tasks are stuck at Allocated state.

On the other hand, in service update, we don't want to kill your services. So if update encounters failures, it should pause or proceed base on FailureAction configuration.

Deepak-Vohra commented 7 years ago

better to fail this update with timeout mechanism

Is a timeout provided for service create and service update? Did not find any.

dongluochen commented 7 years ago

service create don't need timeout because it doesn't need to stop. I think service update might need one.

~$ docker service inspect redis --pretty

ID:     yg1ar0a3s1j71i3gd0fx869h7
Name:       redis
Service Mode:   Replicated
 Replicas:  10
UpdateStatus:
 State:     updating
 Started:   3 hours
 Message:   update in progress
Placement:Contraints:   [node.role==manager node.role==worker]
UpdateConfig:
 Parallelism:   1
 On failure:    pause
 Max failure ratio: 0
ContainerSpec:
 Image:     redis:3.0.6@sha256:6a692a76c2081888b589e26e6ec835743119fe453d67ecf03df7de5b73d69842
Resources:
Networks: ovnet
Endpoint Mode:  vip

~$ docker service ps redis | grep -i running
iogkv5q6628s  redis.1      redis:3.0.6  ip-172-19-241-145          Running        Running 3 hours ago
vnkxmb39gn6s  redis.2      redis:3.0.6  ip-172-19-241-145          Running        Running 3 hours ago
rv3nr66ctsr5  redis.3      redis:3.0.6  ip-172-19-241-145          Running        Running 3 hours ago
9v4njcoy6tpr  redis.4      redis:3.0.6  ip-172-19-147-51           Running        Running 3 hours ago
vuqoq5grktbw  redis.5      redis:3.0.6                             Running        Pending 3 hours ago
bszwzjx0492h  redis.6      redis:3.0.6  ip-172-19-241-145          Running        Running 3 hours ago
ztvbdpnf39k0  redis.7      redis:3.0.6  ip-172-19-147-51           Running        Running 3 hours ago
siqa2jipigrf  redis.8      redis:3.0.6  ip-172-19-241-145          Running        Running 3 hours ago
fdzz67joia26  redis.9      redis:3.0.6  ip-172-19-147-51           Running        Running 3 hours ago
0rapd9ghcl6d  redis.10     redis:3.0.6  ip-172-19-147-51           Running        Running 3 hours ago
Deepak-Vohra commented 7 years ago

Yes, thanks. update should probably include a timeout and a rollback option.

dongluochen commented 7 years ago

@aaronlehmann @aluzzardi What do you think of update that stuck at Pending (previously Allocated)? I think we should fail the update, either with number of retries or timeout. I think timeout is more generally applied and user friendly.

aaronlehmann commented 7 years ago

I think having an optional timeout is a good idea. In many cases, failing the update is the right thing to do, but in others the goal may be to wait for it to converge.

Deepak-Vohra commented 7 years ago

@dongluochen

Keeping in Allocated at least for a while does have a purpose as replicas start Running when the constraints are removed. If the replica is failed it won't be able to run when a constraint is modified. May be a timeout is a better option, but enough so that while constraints are being added/removed with the objective of eventually assigning the replicas and make them running, the replicas should not fail.

dongluochen commented 7 years ago

@dvohra When constraints are changed, it starts a new update which overwrites previous update.

@aaronlehmann Agree with the optional timeout setting.

chenwuwen commented 3 years ago

@dongluochen The docker version I use is 19.03.8. Now I have a service that uses swarm to deploy 3 copies. The cluster has three nodes, one master and two workers. Each node is set with label, and the value of label is different. When I update the service (update image), I use the parameter -- constraint add 'node. Labels. XX = = XX'. My goal is to make one of the three nodes use the new image, so as to achieve the coexistence of new and old services. But when I do this, I find that all nodes use the new image, and the constraint is not X

我使用的docker版本是19.03.8。现在我有一个服务使用swarm部署副本数为3。集群有3个节点,一个master,两个worker。每个节点都设置了label,label的值各不相同。当我更新服务时(更新镜像)使用--constraint-add 'node.labels.xx==xx'的参数。我的目的是,想让三个节点中的指定一个节点使用新镜像,以达到新老服务共存的目的。但是当我这么做的时候,发现所有的节点都使用了新镜像,约束没有生效