Open markvr opened 7 years ago
Please try with 1.13.1 This should be fixed.
Ah OK thanks - I'll give it a go after 1st March. I guess you can close the ticket - I can always reopen it if there are still issues. thanks for the quick response!
Thanks for reporting! It's appreciated
Reopening because this still occurs with the 17.03.0-ce:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Experimental: false
It's not related to compose - it occurs just doing docker update
. It definitely seems that the more environment variables a service has, the more likely it is to have these spurious updates.
ping @vdemeester ^^
Still an issue with docker 17.05.0-ce Definitely has seen less (if any) unneeded updates with 1.13.1 so assume some level of regression happened
@markvr @ifourmanov would it be possible to compare the docker service inspect
output of the service before/after the update, to check what modified?
@thaJeztah cat before.json
[
{
"ID": "w9fbehh49n651uv0wio7di7e6",
"Version": {
"Index": 5166
},
"CreatedAt": "2017-05-22T14:47:00.968532956Z",
"UpdatedAt": "2017-05-30T06:54:53.023689197Z",
"Spec": {
"Name": "hbase_entitystore",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/config.xml",
"Target": "/usr/local/tomcat/conf/config.xml"
},
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/log4j2.xml",
"Target": "/usr/local/tomcat/conf/log4j2.xml"
},
{
"Type": "bind",
"Source": "/var/log/entitystore",
"Target": "/usr/local/tomcat/logs"
}
],
"StopGracePeriod": 10000000000,
"DNSConfig": {}
},
"Resources": {},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Constraints": [
"node.labels.apps == entitystore"
]
},
"Networks": [
{
"Target": "5yqrank61kemtdc3khfalxdf5",
"Aliases": [
"entitystore.hadoop.staging.ds.local",
"entitystore"
]
}
],
"ForceUpdate": 0
},
"Mode": {
"Replicated": {
"Replicas": 3
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"PreviousSpec": {
"Name": "hbase_entitystore",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/config.xml",
"Target": "/usr/local/tomcat/conf/config.xml"
},
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/log4j2.xml",
"Target": "/usr/local/tomcat/conf/log4j2.xml"
},
{
"Type": "bind",
"Source": "/var/log/entitystore",
"Target": "/usr/local/tomcat/logs"
}
]
},
"Resources": {},
"Placement": {
"Constraints": [
"node.labels.apps == entitystore"
]
},
"Networks": [
{
"Target": "5yqrank61kemtdc3khfalxdf5",
"Aliases": [
"entitystore.hadoop.staging.ds.local",
"entitystore"
]
}
],
"ForceUpdate": 0
},
"Mode": {
"Replicated": {
"Replicas": 3
}
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"Endpoint": {
"Spec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
},
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
],
"VirtualIPs": [
{
"NetworkID": "w3tmr6b3dlsg2dbfyeivo2axe",
"Addr": "10.255.0.11/16"
},
{
"NetworkID": "5yqrank61kemtdc3khfalxdf5",
"Addr": "172.28.0.4/16"
}
]
},
"UpdateStatus": {
"State": "completed",
"StartedAt": "2017-05-30T06:54:38.437976297Z",
"CompletedAt": "2017-05-30T06:54:53.023664541Z",
"Message": "update completed"
}
}
]
docker service update hbase_entitystore --image redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest --with-registry-auth
cat after.json
[
{
"ID": "w9fbehh49n651uv0wio7di7e6",
"Version": {
"Index": 5199
},
"CreatedAt": "2017-05-22T14:47:00.968532956Z",
"UpdatedAt": "2017-05-30T07:35:19.255706173Z",
"Spec": {
"Name": "hbase_entitystore",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/config.xml",
"Target": "/usr/local/tomcat/conf/config.xml"
},
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/log4j2.xml",
"Target": "/usr/local/tomcat/conf/log4j2.xml"
},
{
"Type": "bind",
"Source": "/var/log/entitystore",
"Target": "/usr/local/tomcat/logs"
}
],
"StopGracePeriod": 10000000000,
"DNSConfig": {}
},
"Resources": {},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Constraints": [
"node.labels.apps == entitystore"
]
},
"Networks": [
{
"Target": "5yqrank61kemtdc3khfalxdf5",
"Aliases": [
"entitystore.hadoop.staging.ds.local",
"entitystore"
]
}
],
"ForceUpdate": 0
},
"Mode": {
"Replicated": {
"Replicas": 3
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"PreviousSpec": {
"Name": "hbase_entitystore",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "redacted.dkr.ecr.eu-west-1.amazonaws.com/entitystore:latest@sha256:f8d8700d6f6ec09f67e99965c56d21b0bd68884081ab601f5365fe0e1c641692",
"Labels": {
"com.docker.stack.namespace": "hbase"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/docker-storage/config/config.xml",
"Target": "/usr/local/tomcat/conf/config.xml"
},
{
"Type": "bind",
"Source": "/crypt/var/lib/docker-storage/config/log4j2.xml",
"Target": "/usr/local/tomcat/conf/log4j2.xml"
},
{
"Type": "bind",
"Source": "/var/log/entitystore",
"Target": "/usr/local/tomcat/logs"
}
]
},
"Resources": {},
"Placement": {
"Constraints": [
"node.labels.apps == entitystore"
]
},
"Networks": [
{
"Target": "5yqrank61kemtdc3khfalxdf5",
"Aliases": [
"entitystore.hadoop.staging.ds.local",
"entitystore"
]
}
],
"ForceUpdate": 0
},
"Mode": {
"Replicated": {
"Replicas": 3
}
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"Endpoint": {
"Spec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
},
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 8080,
"PublishedPort": 80,
"PublishMode": "ingress"
}
],
"VirtualIPs": [
{
"NetworkID": "w3tmr6b3dlsg2dbfyeivo2axe",
"Addr": "10.255.0.11/16"
},
{
"NetworkID": "5yqrank61kemtdc3khfalxdf5",
"Addr": "172.28.0.4/16"
}
]
},
"UpdateStatus": {
"State": "completed",
"StartedAt": "2017-05-30T07:35:04.930463539Z",
"CompletedAt": "2017-05-30T07:35:19.255672547Z",
"Message": "update completed"
}
}
]
diff before.json after.json
5c5
< "Index": 5166
---
> "Index": 5199
8c8
< "UpdatedAt": "2017-05-30T06:54:53.023689197Z",
---
> "UpdatedAt": "2017-05-30T07:35:19.255706173Z",
189,190c189,190
< "StartedAt": "2017-05-30T06:54:38.437976297Z",
< "CompletedAt": "2017-05-30T06:54:53.023664541Z",
---
> "StartedAt": "2017-05-30T07:35:04.930463539Z",
> "CompletedAt": "2017-05-30T07:35:19.255672547Z",
The only thing that might have changed between docker service update calls I can think of are contents of log files in mounted log directory. Still all the containers were recreated.
Docker stack deploy has exactly the same behaviour
Thanks @ifourmanov. Interesting, so indeed basically "nothing" changed, other than the updated/started times, which should be a result of updating the service-spec, not cause a service spec to be updated.
@aaronlehmann any ideas?
@ifourmanov I'm discussing this issue with @aaronlehmann on Slack, and he suspects the change in the service may be in fields that are not exposed through the remote API (therefore the output of docker service inspect
didn't show those changes).
Can you share the contents of the /var/lib/docker/swarm
directory per e-mail, or a direct message on the docker community slack because that directory contains the private keys used by swarm ? You can send to sebastiaan@docker.com, or ping me on Slack (@thaJeztah) and I'll make sure it gets to the right people to investigate 👍
@thaJeztah sent archive via Slack
The registry credentials seem to have changed between these two versions of the service. I think this is because Amazon ECR uses short-lived tokens for registry access.
That's a viable possibility. I don't think that changing authentication should prompt for service recreate though, given that image name and sha256 remained the same
I believe #29676 is relevant to this discussion - see the very last comments.
@thaJeztah are there any plans for fixing this behaviour? Effectively it's yet another blocker for using stack/swarm in AWS
Any movement on this?
We are running 17.05-CE on AWS, and this is still a problem.
When we deploy a stack, all services are updated according to stdout. In reality only the services that change are rolled over, but the UpdatedAt
timestamp is touched for every service, even it it hasn't changed. It sure would be nice if the UpdatedAt
timestamp were more accurate.
edit - version info:
$ docker --version
Docker version 17.06.0-ce, build 02c1d87
$ cat /proc/version
Linux version 3.10.0-514.26.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jul 4 15:04:05 UTC 2017
same problem. All my services from my private Aws ECR registry get restarted every time i launch stack deploy. I'm not sure if this is a login problem since if i launch docker stack deploy
with 5 min interval the will all get restarted(the public repos work fine). Alos for each service i user image: myserxice:1.2.3
so no latest
here.
We're experiencing a very similar thing here. We have two swarm environments, and I've noticed it happening on one but not the other.
The problematic swarm:
Containers: 98
Running: 30
Paused: 0
Stopped: 68
Images: 26
Server Version: 17.06.2-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 313
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: x4pi9ch2bymtrvnmcsgdreda7
Is Manager: true
ClusterID: 1w99lhsjs74ukuwz6mwocxdgo
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.146.0.24
Manager Addresses:
10.146.0.24:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.10.0-40-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.302GiB
Name: prod-swarm-frontend-1
ID: 5A47:QR5V:YXLX:TF7D:SNV2:PKNQ:H2IB:5JXV:QH7L:QJI5:YHOI:TXQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
The fine
swarm:
Containers: 19
Running: 6
Paused: 0
Stopped: 13
Images: 7
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 87
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: mmzn3xjmo5cqa7wjw3eh4axea
Is Manager: true
ClusterID: mwhceh9fd9rubn6tpig92ucd8
Managers: 1
Nodes: 6
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.146.0.2
Manager Addresses:
10.146.0.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.10.0-33-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.794GiB
Name: prod-swarm-ubuntu-1
ID: N6WL:JHGC:TRIQ:KZJD:NHLR:SUEA:AXVG:R7WU:3CT2:Q24W:XDTQ:RRE5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Both swarms are maintained using the same ansible scripts and stack deploy
method. We authenticate with GCE on every stack deploy
.
Is this ticket still being pursued or not? For us it's becoming more and more a dealbreaker to continue using Swarm in production.
@briandeheus Are you running an update to date version of docker? Swarm only updates services when the desired state is different than the actual state.
I know there have been various issues with the docker stack deploy
command line that can cause things like non-deterministic ordering of an array which can trigger an update.
@ifourmanov Your before/after don't look quite right. The version index between the two shows there have been 33 updates in between and the "after" also has a "previous spec" which does not match the "before"... namely one of the mounts in "before" is "/var/lib/docker-storage..." and after is "/crypt/var/lib/docker-storage..."
I'm facing roughly the same issue. But in my case docker stack deploy -c docker-compose.yml --resolve-image changed --with-registry-auth
redeploys all unchanged services only on the second run. Initially I though it was due to environment variables, which I changed quite a lot, but then I got redeploys even having unchanged compose file.
any workaround/fix for this yet ? Very diffcult to maintain swarm in prod on aws because of this.
@marutib Can you diff the service before and after the deploy?
@cpuguy83 Will try and get next time I deploy. I have 32 services on the swarm. Will diff a couple of them next time we add a service.
Actually, I guess the stored "previous" version in an instance where this was a problem will do.
FWIW, I've seen this behaviour with services with more than one environment variable, as they are not always given in the same order.
The order of environment variables once was addressed in https://github.com/moby/moby/pull/32364 (but that's for updating using docker service update
, we should check if docker stack deploy
also uses something similar)
@cpuguy83 You are correct docker service inspect did give me the previous Spec also. And the only change seems to be the order of the environment variables like @sirlatrom said
But the latest order seems to be in a sorted order, so I will check if this happens again
Looks like I was changing the order of the variables when I was doing docker service update and that lead to this. I will confirm to see if this fixes the issue in my next launch.
We've also seen unchanged services being updated on docker stack deploy
which is a big issue for us. Currently we're on 17.12.1
and I've only seen it happen when using --resolve-image=changed
not with --resolve-image=always
.
We're running 18.03.1
on our test environment, and so far I haven't seen any problems when updating, even with --resolve-image=changed
(which really helps reduce deploy times).
Have this same issue. Updated one environment variable in my compose file and the stack deploy command restarted all the services.
I am having a (possibly ?) related issue to this.
docker -v
Docker version 18.09.3, build 774a1f4
We find that sometimes when we do docker stack deploy
we get unwanted updates of other services but, more importantly, we do not get rolling updates.
Instead, all replicas seem to restart at once, causing a brief outage to our production sites.
This is a frustration for us here as it means that deployments are a big gamble.
I can provide extra info if needed.
I can confirm, this problem still exists on 19.03.8
.
docker -v
Docker version 19.03.8, build afacb8b7f0
Sometimes no service affected Sometimes all services recreated Sometimes the wanted service getting restarted, as expected
I can't reproduce the explained behaviors, so it's really hard to investigate on which circumstances the stack gets updated.
I kept the output of docker inspect
of all the services before doing docker stack deploy
.
After the deployment, a random service got updated which wasn't meant to happen. This time that service was a Postgres database.
Below I've pasted the before and after docker inspect
output of the Postgres service which shouldn't have been updated as there was no changes done to it.
Background: I am running a Swarm cluster with a local registry for custom images, after building and pushing those images, I did docker stack deploy
to update those services.
Before output of docker service inspect
for the affected service-
[
{
"ID": "nl3sp83wje9lgo0ir40ivmgus",
"Version": {
"Index": 6699
},
"CreatedAt": "2020-09-25T10:49:07.5135648Z",
"UpdatedAt": "2020-09-25T10:51:37.8209599Z",
"Spec": {
"Name": "pdstack_nd_postgres",
"Labels": {
"com.docker.stack.image": "postgres:12",
"com.docker.stack.namespace": "pdstack"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
"Labels": {
"com.docker.stack.namespace": "pdstack"
},
"Env": [
"POSTGRES_DB=test",
"POSTGRES_PASSWORD=test",
"POSTGRES_USER=test"
],
"Privileges": {
"CredentialSpec": null,
"SELinuxContext": null
},
"Mounts": [
{
"Type": "volume",
"Source": "pdstack_metadata_data",
"Target": "/var/lib/postgresql/data",
"VolumeOptions": {
"Labels": {
"com.docker.stack.namespace": "pdstack"
}
}
}
],
"StopGracePeriod": 10000000000,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Constraints": [
"node.labels.nd == true"
],
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "mips64le",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"Networks": [
{
"Target": "fl2pslaogu09wb8v8szuawsc1",
"Aliases": [
"nd_postgres"
]
}
],
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip"
}
},
"PreviousSpec": {
"Name": "pdstack_nd_postgres",
"Labels": {
"com.docker.stack.image": "postgres:12",
"com.docker.stack.namespace": "pdstack"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
"Labels": {
"com.docker.stack.namespace": "pdstack"
},
"Env": [
"POSTGRES_DB=test",
"POSTGRES_PASSWORD=test",
"POSTGRES_USER=test"
],
"Privileges": {
"CredentialSpec": null,
"SELinuxContext": null
},
"Mounts": [
{
"Type": "volume",
"Source": "pdstack_metadata_data",
"Target": "/var/lib/postgresql/data",
"VolumeOptions": {
"Labels": {
"com.docker.stack.namespace": "pdstack"
}
}
}
],
"Isolation": "default"
},
"Resources": {},
"Placement": {
"Constraints": [
"node.labels.nd == true"
],
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "mips64le",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"Networks": [
{
"Target": "fl2pslaogu09wb8v8szuawsc1",
"Aliases": [
"nd_postgres"
]
}
],
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"EndpointSpec": {
"Mode": "vip"
}
},
"Endpoint": {
"Spec": {
"Mode": "vip"
},
"VirtualIPs": [
{
"NetworkID": "fl2pslaogu09wb8v8szuawsc1",
"Addr": "10.0.6.35/24"
}
]
}
}
]
After output of docker service inspect xxx
for the affected service-
[
{
"ID": "nl3sp83wje9lgo0ir40ivmgus",
"Version": {
"Index": 7961
},
"CreatedAt": "2020-09-25T10:49:07.5135648Z",
"UpdatedAt": "2020-10-13T18:53:50.22396559Z",
"Spec": {
"Name": "pdstack_nd_postgres",
"Labels": {
"com.docker.stack.image": "postgres:12",
"com.docker.stack.namespace": "pdstack"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "postgres:12@sha256:a1e04460fdd3c338d6b65a2ab66b5aa2748eb18da3e55bcdc9ef17831ed3ad46",
"Labels": {
"com.docker.stack.namespace": "pdstack"
},
"Env": [
"POSTGRES_DB=test",
"POSTGRES_PASSWORD=test",
"POSTGRES_USER=test"
],
"Privileges": {
"CredentialSpec": null,
"SELinuxContext": null
},
"Mounts": [
{
"Type": "volume",
"Source": "pdstack_metadata_data",
"Target": "/var/lib/postgresql/data",
"VolumeOptions": {
"Labels": {
"com.docker.stack.namespace": "pdstack"
}
}
}
],
"StopGracePeriod": 10000000000,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Constraints": [
"node.labels.nd == true"
],
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "mips64le",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"Networks": [
{
"Target": "fl2pslaogu09wb8v8szuawsc1",
"Aliases": [
"nd_postgres"
]
}
],
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip"
}
},
"PreviousSpec": {
"Name": "pdstack_nd_postgres",
"Labels": {
"com.docker.stack.image": "postgres:12",
"com.docker.stack.namespace": "pdstack"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
"Labels": {
"com.docker.stack.namespace": "pdstack"
},
"Env": [
"POSTGRES_DB=test",
"POSTGRES_PASSWORD=test",
"POSTGRES_USER=test"
],
"Privileges": {
"CredentialSpec": null,
"SELinuxContext": null
},
"Mounts": [
{
"Type": "volume",
"Source": "pdstack_metadata_data",
"Target": "/var/lib/postgresql/data",
"VolumeOptions": {
"Labels": {
"com.docker.stack.namespace": "pdstack"
}
}
}
],
"Isolation": "default"
},
"Resources": {},
"Placement": {
"Constraints": [
"node.labels.nd == true"
],
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "mips64le",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"Networks": [
{
"Target": "fl2pslaogu09wb8v8szuawsc1",
"Aliases": [
"nd_postgres"
]
}
],
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"EndpointSpec": {
"Mode": "vip"
}
},
"Endpoint": {
"Spec": {
"Mode": "vip"
},
"VirtualIPs": [
{
"NetworkID": "fl2pslaogu09wb8v8szuawsc1",
"Addr": "10.0.6.35/24"
}
]
},
"UpdateStatus": {
"State": "completed",
"StartedAt": "2020-10-13T18:53:33.389718357Z",
"CompletedAt": "2020-10-13T18:53:50.223943667Z",
"Message": "update completed"
}
}
]
Diff
< "Index": 6699
---
> "Index": 7961
< "UpdatedAt": "2020-09-25T10:51:37.8209599Z",
---
> "UpdatedAt": "2020-10-13T18:53:50.22396559Z",
< "Image": "postgres:12@sha256:31122316d7afefa1d99d843f3a1a09a5484304183ecff7ab943b8bb94ba44ba4",
---
> "Image": "postgres:12@sha256:a1e04460fdd3c338d6b65a2ab66b5aa2748eb18da3e55bcdc9ef17831ed3ad46",
<
---
> },
"UpdateStatus": {
"State": "completed",
"StartedAt": "2020-10-13T18:53:33.389718357Z",
"CompletedAt": "2020-10-13T18:53:50.223943667Z",
"Message": "update completed"
It can be noticed that the Image
hash changed unexpectedly, it shouldn't have happened since the image is being pinned to a specific version and pulled from Docker Hub.
I guess this is the reason why the service got updated.
The docker-compose.yml
file containing the affected service, there are many other services too which I've omitted because they weren't affected-
version: "3.3"
services:
nd_postgres:
container_name: nd_postgres
image: postgres:12
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_DB=test
- POSTGRES_USER=test
- POSTGRES_PASSWORD=test
deploy:
placement:
constraints:
- node.labels.nd == true
volumes:
metadata_data:
I was running into the issue that services were being restarted when a new image tag was added, while the underlying image digest hash had not changed.
This workaround resolves all referenced image tags to their explicit sha:
Run the docker-compose.yml
file through docker-compose using the command line option --resolve-image-digests
, which adds the sha256 hash to all service images. Since thesse resolutions still contain the tag names, remove them using sed.
docker-compose config --resolve-image-digests > "docker-compose-resolved.yml"
sed -ri 's/(\/[^:]+):[^@]+@sha256/\1@sha256/' "docker-compose-resolved.yml"
image: registry.gitlab.com/foo:tag
becomes
image: registry.gitlab.com/foo@sha256:b4b7f74bbb3164cb88b9b7f71ad824dc1a99b43fad678b6b9404c0ad4a9124b3
Now when deploying the stack to a swarm, services are no longer restarted when a new tag is specified pointing to the same underlying image.
This problem is still there in the latest 20.10.9 strange thing is that I have 2 clusters with the same docker version and only one exhibits the behavior of randomly restarting containers when redeploying a stack file.
The one that works never has a digest when doing docker inspect of a service
"Spec": {
"Name": "elasticsearch_elastic1-1",
"Labels": {
"com.docker.stack.image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0",
"com.docker.stack.namespace": "elasticsearch"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1",
.
.
.
.
"PreviousSpec": {
"Name": "elasticsearch_elastic1-1",
"Labels": {
"com.docker.stack.image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0",
"com.docker.stack.namespace": "elasticsearch"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "docker.elastic.co/elasticsearch/elasticsearch:7.15.0@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1",
When looking at the bad cluster. it seems that before the u^pgrade there was no image tag/digest attached to the image label.
"Spec": {
"Name": "elastic_logstash",
"Labels": {
"com.docker.stack.image": "docker.elastic.co/logstash/logstash:7.15.0",
"com.docker.stack.namespace": "elastic"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "docker.elastic.co/logstash/logstash:7.15.0",
.
.
.
"PreviousSpec": {
"Name": "elastic_logstash",
"Labels": {
"com.docker.stack.image": "docker.elastic.co/logstash/logstash:7.15.0",
"com.docker.stack.namespace": "elastic"
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "docker.elastic.co/logstash/logstash:7.15.0@sha256:ba6ee9c11620d0bb9d5bff5937bdf995b71bc7a2bcd1047b1458cf752194b54a",
Description
When doing
docker stack deploy -c test.yaml test
, services in the stack will randomly be updated, even if there are no changes. Weirdly, this only seems to be the case for services that have environment variables, but it's hard to investigate because the issue is transient.Steps to reproduce the issue:
docker stack deploy -c test.yaml test
, checking the update status withdocker service inspect
each time.e.g. stack file
Loop script:
Output:
Describe the results you expected: Output to always be:
But sometimes it is:
Additional information you deem important (e.g. issue happens only occasionally):
This is transient, and only appears to be the case for stack files with environment variables, but that may or may not be relevant.
Output of
docker version
:This is using the latest released "docker-for-azure"
Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.): Using docker-for-azure
I will shortly be away until 1st March, and so will follow up on any comments then.