'docker stack deploy' randomly updates services that haven't changed

markvr commented 7 years ago

Description

When doing docker stack deploy -c test.yaml test, services in the stack will randomly be updated, even if there are no changes. Weirdly, this only seems to be the case for services that have environment variables, but it's hard to investigate because the issue is transient.

Steps to reproduce the issue:

Create a simple test stack, with environment variables for a service
Repeatedly run docker stack deploy -c test.yaml test, checking the update status with docker service inspect each time.

e.g. stack file

version: "3"
services:
  tomcat:
    image:  tomcat
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        parallelism: 1
        failure_action: pause
        max_failure_ratio: 0
      placement:
        constraints:
          - node.role != manager
    environment:
      APP_ACTIVE_ENVIRONMENTS: preprod
      EXTRA_SETTINGS: server failover out-of-service:80 backup
      HTTP_CHECK: OPTIONS /azure-test-webapp HTTP/1.1\r\nHost:\ www
      SERVICE_PORTS: 8080
      VIRTUAL_HOST: https://azure-docker-swarm-test-prod.service
      COOKIE: SRV insert indirect nocache
      FORCE_SSL: "true"

Loop script:

for i in `seq 1 10`;
do
  docker stack deploy -c env-vars.yaml test; docker service inspect test_tomcat | jq '.[0] | {UpdateStatus}' -c
  sleep 60
done

Output:

swarm-manager000000:~$ ./loop
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"State":"updating","StartedAt":"2017-02-17T11:12:39.4027968Z","CompletedAt":"1970-01-01T00:00:00Z","Message":"update in progress"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"State":"updating","StartedAt":"2017-02-17T11:16:59.068760564Z","CompletedAt":"1970-01-01T00:00:00Z","Message":"update in progress"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"State":"updating","StartedAt":"2017-02-17T11:18:03.755051094Z","CompletedAt":"1970-01-01T00:00:00Z","Message":"update in progress"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}
Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}

Describe the results you expected: Output to always be:

Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"StartedAt":"0001-01-01T00:00:00Z","CompletedAt":"0001-01-01T00:00:00Z"}}

But sometimes it is:

Updating service test_tomcat (id: ycet112q4pilqi6urpewlk14g)
{"UpdateStatus":{"State":"updating","StartedAt":"2017-02-17T11:18:03.755051094Z","CompletedAt":"1970-01-01T00:00:00Z","Message":"update in progress"}}

Additional information you deem important (e.g. issue happens only occasionally):

This is transient, and only appears to be the case for stack files with environment variables, but that may or may not be relevant.

Output of docker version:

This is using the latest released "docker-for-azure"

swarm-manager000000:~$ docker version
Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 21:19:34 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 21:19:34 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

swarm-manager000000:~$ docker info
Containers: 25
 Running: 8
 Paused: 0
 Stopped: 17
Images: 20
Server Version: 1.13.0
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: active
 NodeID: hwdser1fzpu2ygtxmcpxa9y0l
 Is Manager: true
 ClusterID: uaxb7n1n2zfvlzepmrgvdfveu
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.240.4.5
 Manager Addresses:
  10.240.4.5:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.4-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.635 GiB
Name: swarm-manager000000
ID: R23C:27VL:ZBA3:HHJW:ROSU:GCTP:GVBP:36T5:OD4I:DB3C:WBS2:CIE6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 127
 Goroutines: 250
 System Time: 2017-02-17T11:44:54.417270752Z
 EventsListeners: 3
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): Using docker-for-azure

I will shortly be away until 1st March, and so will follow up on any comments then.

cpuguy83 commented 7 years ago

Please try with 1.13.1 This should be fixed.

markvr commented 7 years ago

Ah OK thanks - I'll give it a go after 1st March. I guess you can close the ticket - I can always reopen it if there are still issues. thanks for the quick response!

thaJeztah commented 7 years ago

Thanks for reporting! It's appreciated

markvr commented 7 years ago

Reopening because this still occurs with the 17.03.0-ce:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64
 Experimental: false

It's not related to compose - it occurs just doing docker update. It definitely seems that the more environment variables a service has, the more likely it is to have these spurious updates.

thaJeztah commented 7 years ago

ping @vdemeester ^^

ifourmanov commented 7 years ago

Still an issue with docker 17.05.0-ce Definitely has seen less (if any) unneeded updates with 1.13.1 so assume some level of regression happened

thaJeztah commented 7 years ago

@markvr @ifourmanov would it be possible to compare the docker service inspect output of the service before/after the update, to check what modified?

ifourmanov commented 7 years ago

@thaJeztah cat before.json

[
    {
        "ID": "w9fbehh49n651uv0wio7di7e6",
        "Version": {
            "Index": 5166
        },
        "CreatedAt": "2017-05-22T14:47:00.968532956Z",
        "UpdatedAt": "2017-05-30T06:54:53.023689197Z",
        "Spec": {
            "Name": "hbase_entitystore",
            "Labels": {
                "com.docker.stack.namespace": "hbase"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
                    "Labels": {
                        "com.docker.stack.namespace": "hbase"
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/config.xml",
                            "Target": "/usr/local/tomcat/conf/config.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/log4j2.xml",
                            "Target": "/usr/local/tomcat/conf/log4j2.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/log/entitystore",
                            "Target": "/usr/local/tomcat/logs"
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {}
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.labels.apps == entitystore"
                    ]
                },
                "Networks": [
                    {
                        "Target": "5yqrank61kemtdc3khfalxdf5",
                        "Aliases": [
                            "entitystore.hadoop.staging.ds.local",
                            "entitystore"
                        ]
                    }
                ],
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "PreviousSpec": {
            "Name": "hbase_entitystore",
            "Labels": {
                "com.docker.stack.namespace": "hbase"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
                    "Labels": {
                        "com.docker.stack.namespace": "hbase"
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/config.xml",
                            "Target": "/usr/local/tomcat/conf/config.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/log4j2.xml",
                            "Target": "/usr/local/tomcat/conf/log4j2.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/log/entitystore",
                            "Target": "/usr/local/tomcat/logs"
                        }
                    ]
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.labels.apps == entitystore"
                    ]
                },
                "Networks": [
                    {
                        "Target": "5yqrank61kemtdc3khfalxdf5",
                        "Aliases": [
                            "entitystore.hadoop.staging.ds.local",
                            "entitystore"
                        ]
                    }
                ],
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8080,
                    "PublishedPort": 80,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "w3tmr6b3dlsg2dbfyeivo2axe",
                    "Addr": "10.255.0.11/16"
                },
                {
                    "NetworkID": "5yqrank61kemtdc3khfalxdf5",
                    "Addr": "172.28.0.4/16"
                }
            ]
        },
        "UpdateStatus": {
            "State": "completed",
            "StartedAt": "2017-05-30T06:54:38.437976297Z",
            "CompletedAt": "2017-05-30T06:54:53.023664541Z",
            "Message": "update completed"
        }
    }
]

docker service update hbase_entitystore --image redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest --with-registry-auth

cat after.json

[
    {
        "ID": "w9fbehh49n651uv0wio7di7e6",
        "Version": {
            "Index": 5199
        },
        "CreatedAt": "2017-05-22T14:47:00.968532956Z",
        "UpdatedAt": "2017-05-30T07:35:19.255706173Z",
        "Spec": {
            "Name": "hbase_entitystore",
            "Labels": {
                "com.docker.stack.namespace": "hbase"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
                    "Labels": {
                        "com.docker.stack.namespace": "hbase"
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/config.xml",
                            "Target": "/usr/local/tomcat/conf/config.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/log4j2.xml",
                            "Target": "/usr/local/tomcat/conf/log4j2.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/log/entitystore",
                            "Target": "/usr/local/tomcat/logs"
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {}
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.labels.apps == entitystore"
                    ]
                },
                "Networks": [
                    {
                        "Target": "5yqrank61kemtdc3khfalxdf5",
                        "Aliases": [
                            "entitystore.hadoop.staging.ds.local",
                            "entitystore"
                        ]
                    }
                ],
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "PreviousSpec": {
            "Name": "hbase_entitystore",
            "Labels": {
                "com.docker.stack.namespace": "hbase"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
                    "Labels": {
                        "com.docker.stack.namespace": "hbase"
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/var/lib/docker-storage/config/config.xml",
                            "Target": "/usr/local/tomcat/conf/config.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/crypt/var/lib/docker-storage/config/log4j2.xml",
                            "Target": "/usr/local/tomcat/conf/log4j2.xml"
                        },
                        {
                            "Type": "bind",
                            "Source": "/var/log/entitystore",
                            "Target": "/usr/local/tomcat/logs"
                        }
                    ]
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.labels.apps == entitystore"
                    ]
                },
                "Networks": [
                    {
                        "Target": "5yqrank61kemtdc3khfalxdf5",
                        "Aliases": [
                            "entitystore.hadoop.staging.ds.local",
                            "entitystore"
                        ]
                    }
                ],
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8080,
                    "PublishedPort": 80,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "w3tmr6b3dlsg2dbfyeivo2axe",
                    "Addr": "10.255.0.11/16"
                },
                {
                    "NetworkID": "5yqrank61kemtdc3khfalxdf5",
                    "Addr": "172.28.0.4/16"
                }
            ]
        },
        "UpdateStatus": {
            "State": "completed",
            "StartedAt": "2017-05-30T07:35:04.930463539Z",
            "CompletedAt": "2017-05-30T07:35:19.255672547Z",
            "Message": "update completed"
        }
    }
]

diff before.json after.json

5c5
<             "Index": 5166
---
>             "Index": 5199
8c8
<         "UpdatedAt": "2017-05-30T06:54:53.023689197Z",
---
>         "UpdatedAt": "2017-05-30T07:35:19.255706173Z",
189,190c189,190
<             "StartedAt": "2017-05-30T06:54:38.437976297Z",
<             "CompletedAt": "2017-05-30T06:54:53.023664541Z",
---
>             "StartedAt": "2017-05-30T07:35:04.930463539Z",
>             "CompletedAt": "2017-05-30T07:35:19.255672547Z",

The only thing that might have changed between docker service update calls I can think of are contents of log files in mounted log directory. Still all the containers were recreated.

Docker stack deploy has exactly the same behaviour

thaJeztah commented 7 years ago

Thanks @ifourmanov. Interesting, so indeed basically "nothing" changed, other than the updated/started times, which should be a result of updating the service-spec, not cause a service spec to be updated.

@aaronlehmann any ideas?

thaJeztah commented 7 years ago

@ifourmanov I'm discussing this issue with @aaronlehmann on Slack, and he suspects the change in the service may be in fields that are not exposed through the remote API (therefore the output of docker service inspect didn't show those changes).

Can you share the contents of the /var/lib/docker/swarm directory per e-mail, or a direct message on the docker community slack because that directory contains the private keys used by swarm ? You can send to sebastiaan@docker.com, or ping me on Slack (@thaJeztah) and I'll make sure it gets to the right people to investigate 👍

ifourmanov commented 7 years ago

@thaJeztah sent archive via Slack

aaronlehmann commented 7 years ago

The registry credentials seem to have changed between these two versions of the service. I think this is because Amazon ECR uses short-lived tokens for registry access.

ifourmanov commented 7 years ago

That's a viable possibility. I don't think that changing authentication should prompt for service recreate though, given that image name and sha256 remained the same

demaniak commented 7 years ago

I believe #29676 is relevant to this discussion - see the very last comments.

ifourmanov commented 7 years ago

@thaJeztah are there any plans for fixing this behaviour? Effectively it's yet another blocker for using stack/swarm in AWS

demaniak commented 7 years ago

Any movement on this?

We are running 17.05-CE on AWS, and this is still a problem.

johnomalley commented 7 years ago

When we deploy a stack, all services are updated according to stdout. In reality only the services that change are rolled over, but the UpdatedAt timestamp is touched for every service, even it it hasn't changed. It sure would be nice if the UpdatedAt timestamp were more accurate.

edit - version info:

$ docker --version
Docker version 17.06.0-ce, build 02c1d87
$ cat /proc/version
Linux version 3.10.0-514.26.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jul 4 15:04:05 UTC 2017

sulphur commented 7 years ago

same problem. All my services from my private Aws ECR registry get restarted every time i launch stack deploy. I'm not sure if this is a login problem since if i launch docker stack deploy with 5 min interval the will all get restarted(the public repos work fine). Alos for each service i user image: myserxice:1.2.3 so no latest here.

coughlanio commented 6 years ago

We're experiencing a very similar thing here. We have two swarm environments, and I've noticed it happening on one but not the other.

The problematic swarm:

Containers: 98
 Running: 30
 Paused: 0
 Stopped: 68
Images: 26
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 313
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: x4pi9ch2bymtrvnmcsgdreda7
 Is Manager: true
 ClusterID: 1w99lhsjs74ukuwz6mwocxdgo
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.146.0.24
 Manager Addresses:
  10.146.0.24:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-40-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.302GiB
Name: prod-swarm-frontend-1
ID: 5A47:QR5V:YXLX:TF7D:SNV2:PKNQ:H2IB:5JXV:QH7L:QJI5:YHOI:TXQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

The fine swarm:

Containers: 19
 Running: 6
 Paused: 0
 Stopped: 13
Images: 7
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 87
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: mmzn3xjmo5cqa7wjw3eh4axea
 Is Manager: true
 ClusterID: mwhceh9fd9rubn6tpig92ucd8
 Managers: 1
 Nodes: 6
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.146.0.2
 Manager Addresses:
  10.146.0.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-33-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.794GiB
Name: prod-swarm-ubuntu-1
ID: N6WL:JHGC:TRIQ:KZJD:NHLR:SUEA:AXVG:R7WU:3CT2:Q24W:XDTQ:RRE5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Both swarms are maintained using the same ansible scripts and stack deploy method. We authenticate with GCE on every stack deploy.

briandeheus commented 6 years ago

Is this ticket still being pursued or not? For us it's becoming more and more a dealbreaker to continue using Swarm in production.

cpuguy83 commented 6 years ago

@briandeheus Are you running an update to date version of docker? Swarm only updates services when the desired state is different than the actual state.

I know there have been various issues with the docker stack deploy command line that can cause things like non-deterministic ordering of an array which can trigger an update.

cpuguy83 commented 6 years ago

@ifourmanov Your before/after don't look quite right. The version index between the two shows there have been 33 updates in between and the "after" also has a "previous spec" which does not match the "before"... namely one of the mounts in "before" is "/var/lib/docker-storage..." and after is "/crypt/var/lib/docker-storage..."

andrewnazarov commented 6 years ago

I'm facing roughly the same issue. But in my case docker stack deploy -c docker-compose.yml --resolve-image changed --with-registry-auth redeploys all unchanged services only on the second run. Initially I though it was due to environment variables, which I changed quite a lot, but then I got redeploys even having unchanged compose file.

marutib commented 6 years ago

any workaround/fix for this yet ? Very diffcult to maintain swarm in prod on aws because of this.

cpuguy83 commented 6 years ago

@marutib Can you diff the service before and after the deploy?

marutib commented 6 years ago

@cpuguy83 Will try and get next time I deploy. I have 32 services on the swarm. Will diff a couple of them next time we add a service.

cpuguy83 commented 6 years ago

Actually, I guess the stored "previous" version in an instance where this was a problem will do.

sirlatrom commented 5 years ago

FWIW, I've seen this behaviour with services with more than one environment variable, as they are not always given in the same order.

thaJeztah commented 5 years ago

The order of environment variables once was addressed in https://github.com/moby/moby/pull/32364 (but that's for updating using docker service update, we should check if docker stack deploy also uses something similar)

marutib commented 5 years ago

@cpuguy83 You are correct docker service inspect did give me the previous Spec also. And the only change seems to be the order of the environment variables like @sirlatrom said

But the latest order seems to be in a sorted order, so I will check if this happens again

marutib commented 5 years ago

Looks like I was changing the order of the variables when I was doing docker service update and that lead to this. I will confirm to see if this fixes the issue in my next launch.

deadbeef84 commented 5 years ago

We've also seen unchanged services being updated on docker stack deploy which is a big issue for us. Currently we're on 17.12.1 and I've only seen it happen when using --resolve-image=changed not with --resolve-image=always.

We're running 18.03.1 on our test environment, and so far I haven't seen any problems when updating, even with --resolve-image=changed (which really helps reduce deploy times).

ashishxooa commented 5 years ago

Have this same issue. Updated one environment variable in my compose file and the stack deploy command restarted all the services.

bobf commented 5 years ago

I am having a (possibly ?) related issue to this.

docker -v
Docker version 18.09.3, build 774a1f4

We find that sometimes when we do docker stack deploy we get unwanted updates of other services but, more importantly, we do not get rolling updates.

Instead, all replicas seem to restart at once, causing a brief outage to our production sites.

This is a frustration for us here as it means that deployments are a big gamble.

I can provide extra info if needed.

liKe2k1 commented 4 years ago

I can confirm, this problem still exists on 19.03.8.

docker -v
Docker version 19.03.8, build afacb8b7f0

Sometimes no service affected Sometimes all services recreated Sometimes the wanted service getting restarted, as expected

I can't reproduce the explained behaviors, so it's really hard to investigate on which circumstances the stack gets updated.

saifat29 commented 4 years ago

I kept the output of docker inspect of all the services before doing docker stack deploy. After the deployment, a random service got updated which wasn't meant to happen. This time that service was a Postgres database.

Below I've pasted the before and after docker inspect output of the Postgres service which shouldn't have been updated as there was no changes done to it.

Background: I am running a Swarm cluster with a local registry for custom images, after building and pushing those images, I did docker stack deploy to update those services.

Before output of docker service inspect for the affected service-

[
    {
        "ID": "nl3sp83wje9lgo0ir40ivmgus",
        "Version": {
            "Index": 6699
        },
        "CreatedAt": "2020-09-25T10:49:07.5135648Z",
        "UpdatedAt": "2020-09-25T10:51:37.8209599Z",
        "Spec": {
            "Name": "pdstack_nd_postgres",
            "Labels": {
                "com.docker.stack.image": "postgres:12",
                "com.docker.stack.namespace": "pdstack"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
                    "Labels": {
                        "com.docker.stack.namespace": "pdstack"
                    },
                    "Env": [
                        "POSTGRES_DB=test",
                        "POSTGRES_PASSWORD=test",
                        "POSTGRES_USER=test"
                    ],
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "pdstack_metadata_data",
                            "Target": "/var/lib/postgresql/data",
                            "VolumeOptions": {
                                "Labels": {
                                    "com.docker.stack.namespace": "pdstack"
                                }
                            }
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {},
                    "Isolation": "default"
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.labels.nd == true"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "Architecture": "arm64",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "386",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "mips64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "ppc64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "s390x",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "fl2pslaogu09wb8v8szuawsc1",
                        "Aliases": [
                            "nd_postgres"
                        ]
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "PreviousSpec": {
            "Name": "pdstack_nd_postgres",
            "Labels": {
                "com.docker.stack.image": "postgres:12",
                "com.docker.stack.namespace": "pdstack"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
                    "Labels": {
                        "com.docker.stack.namespace": "pdstack"
                    },
                    "Env": [
                        "POSTGRES_DB=test",
                        "POSTGRES_PASSWORD=test",
                        "POSTGRES_USER=test"
                    ],
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "pdstack_metadata_data",
                            "Target": "/var/lib/postgresql/data",
                            "VolumeOptions": {
                                "Labels": {
                                    "com.docker.stack.namespace": "pdstack"
                                }
                            }
                        }
                    ],
                    "Isolation": "default"
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.labels.nd == true"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "Architecture": "arm64",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "386",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "mips64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "ppc64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "s390x",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "fl2pslaogu09wb8v8szuawsc1",
                        "Aliases": [
                            "nd_postgres"
                        ]
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "fl2pslaogu09wb8v8szuawsc1",
                    "Addr": "10.0.6.35/24"
                }
            ]
        }
    }
]

After output of docker service inspect xxx for the affected service-

[
    {
        "ID": "nl3sp83wje9lgo0ir40ivmgus",
        "Version": {
            "Index": 7961
        },
        "CreatedAt": "2020-09-25T10:49:07.5135648Z",
        "UpdatedAt": "2020-10-13T18:53:50.22396559Z",
        "Spec": {
            "Name": "pdstack_nd_postgres",
            "Labels": {
                "com.docker.stack.image": "postgres:12",
                "com.docker.stack.namespace": "pdstack"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "postgres:12@sha256:a1e04460fdd3c338d6b65a2ab66b5aa2748eb18da3e55bcdc9ef17831ed3ad46",
                    "Labels": {
                        "com.docker.stack.namespace": "pdstack"
                    },
                    "Env": [
                        "POSTGRES_DB=test",
                        "POSTGRES_PASSWORD=test",
                        "POSTGRES_USER=test"
                    ],
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "pdstack_metadata_data",
                            "Target": "/var/lib/postgresql/data",
                            "VolumeOptions": {
                                "Labels": {
                                    "com.docker.stack.namespace": "pdstack"
                                }
                            }
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {},
                    "Isolation": "default"
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.labels.nd == true"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "Architecture": "arm64",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "386",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "mips64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "ppc64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "s390x",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "fl2pslaogu09wb8v8szuawsc1",
                        "Aliases": [
                            "nd_postgres"
                        ]
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "PreviousSpec": {
            "Name": "pdstack_nd_postgres",
            "Labels": {
                "com.docker.stack.image": "postgres:12",
                "com.docker.stack.namespace": "pdstack"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
                    "Labels": {
                        "com.docker.stack.namespace": "pdstack"
                    },
                    "Env": [
                        "POSTGRES_DB=test",
                        "POSTGRES_PASSWORD=test",
                        "POSTGRES_USER=test"
                    ],
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "pdstack_metadata_data",
                            "Target": "/var/lib/postgresql/data",
                            "VolumeOptions": {
                                "Labels": {
                                    "com.docker.stack.namespace": "pdstack"
                                }
                            }
                        }
                    ],
                    "Isolation": "default"
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.labels.nd == true"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "OS": "linux"
                        },
                        {
                            "Architecture": "arm64",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "386",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "mips64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "ppc64le",
                            "OS": "linux"
                        },
                        {
                            "Architecture": "s390x",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "fl2pslaogu09wb8v8szuawsc1",
                        "Aliases": [
                            "nd_postgres"
                        ]
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "fl2pslaogu09wb8v8szuawsc1",
                    "Addr": "10.0.6.35/24"
                }
            ]
        },
        "UpdateStatus": {
            "State": "completed",
            "StartedAt": "2020-10-13T18:53:33.389718357Z",
            "CompletedAt": "2020-10-13T18:53:50.223943667Z",
            "Message": "update completed"
        }
    }
]

Diff

<       "Index": 6699
---
>       "Index": 7961

<       "UpdatedAt": "2020-09-25T10:51:37.8209599Z",    
---
>       "UpdatedAt": "2020-10-13T18:53:50.22396559Z",

<       "Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
---
>       "Image": "postgres:12@sha256:a1e04460fdd3c338d6b65a2ab66b5aa2748eb18da3e55bcdc9ef17831ed3ad46",

<
---
>       },
        "UpdateStatus": {
            "State": "completed",
            "StartedAt": "2020-10-13T18:53:33.389718357Z",
            "CompletedAt": "2020-10-13T18:53:50.223943667Z",
            "Message": "update completed"

It can be noticed that the Image hash changed unexpectedly, it shouldn't have happened since the image is being pinned to a specific version and pulled from Docker Hub. I guess this is the reason why the service got updated.

The docker-compose.yml file containing the affected service, there are many other services too which I've omitted because they weren't affected-

version: "3.3"

services:
  nd_postgres:
    container_name: nd_postgres
    image: postgres:12
    volumes:
      - metadata_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=test
      - POSTGRES_USER=test
      - POSTGRES_PASSWORD=test
    deploy:
      placement:
        constraints:
          - node.labels.nd == true

volumes:
  metadata_data:

puckey commented 3 years ago

I was running into the issue that services were being restarted when a new image tag was added, while the underlying image digest hash had not changed.

This workaround resolves all referenced image tags to their explicit sha:

Run the docker-compose.yml file through docker-compose using the command line option --resolve-image-digests, which adds the sha256 hash to all service images. Since thesse resolutions still contain the tag names, remove them using sed.

docker-compose config --resolve-image-digests > "docker-compose-resolved.yml"
sed -ri 's/(\/[^:]+):[^@]+@sha256/\1@sha256/' "docker-compose-resolved.yml"

image: registry.gitlab.com/foo:tag
becomes image: registry.gitlab.com/foo@sha256:b4b7f74bbb3164cb88b9b7f71ad824dc1a99b43fad678b6b9404c0ad4a9124b3

Now when deploying the stack to a swarm, services are no longer restarted when a new tag is specified pointing to the same underlying image.

cosmos1978 commented 3 years ago

This problem is still there in the latest 20.10.9 strange thing is that I have 2 clusters with the same docker version and only one exhibits the behavior of randomly restarting containers when redeploying a stack file.

The one that works never has a digest when doing docker inspect of a service

    "Spec": {
        "Name": "elasticsearch_elastic1-1",
        "Labels": {
            "com.docker.stack.image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0",
            "com.docker.stack.namespace": "elasticsearch"
        },
        "TaskTemplate": {
            "ContainerSpec": {
                "Image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1",

.
.
.
.
    "PreviousSpec": {
        "Name": "elasticsearch_elastic1-1",
        "Labels": {
            "com.docker.stack.image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0",
            "com.docker.stack.namespace": "elasticsearch"
        },
        "TaskTemplate": {
            "ContainerSpec": {
                "Image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1",

When looking at the bad cluster. it seems that before the u^pgrade there was no image tag/digest attached to the image label.

    "Spec": {
        "Name": "elastic_logstash",
        "Labels": {
            "com.docker.stack.image": "docker.elastic.co/logstash/logstash:7.15.0",
            "com.docker.stack.namespace": "elastic"
        },
        "TaskTemplate": {
            "ContainerSpec": {
                "Image": "docker.elastic.co/logstash/logstash:7.15.0",

.
.
.
   "PreviousSpec": {
        "Name": "elastic_logstash",
        "Labels": {
            "com.docker.stack.image": "docker.elastic.co/logstash/logstash:7.15.0",
            "com.docker.stack.namespace": "elastic"
        },
        "TaskTemplate": {
            "ContainerSpec": {
                "Image": "docker.elastic.co/logstash/logstash:7.15.0@sha256:ba6ee9c11620d0bb9d5bff5937bdf995b71bc7a2bcd1047b1458cf752194b54a",

moby / moby

'docker stack deploy' randomly updates services that haven't changed #31115