moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.71k stars 18.66k forks source link

Container label "com.docker.swarm.task.name" incorrect in 1.13.0-rc2 #28806

Open bvis opened 7 years ago

bvis commented 7 years ago

Description

After upgrade to 1.13.0-rc2 our staging swarm cluster I've seen that the labels we were using to monitor the containers were not working correctly: com.docker.swarm.task.name

There's been a naming change that seems to be an error, please correct me if I'm wrong.

On other hand there's another label I don't know why it's always empty "com.docker.swarm.task"

Steps to reproduce the issue:

docker service create --name deleteme --replicas 2 alpine sleep 1000

Describe the results you received:

docker inspect --format='{{json .Config.Labels}}' 7cc | jq
{
  "com.docker.swarm.node.id": "t0nzfz9o3jre4d8uydjdyon5n",
  "com.docker.swarm.service.id": "umbzdq0bmcaii2vds174fz9i3",
  "com.docker.swarm.service.name": "deleteme",
  "com.docker.swarm.task": "",
  "com.docker.swarm.task.id": "wt04bs73a33xrf63fjulo6my3",
  "com.docker.swarm.task.name": "deleteme.2.wt04bs73a33xrf63fjulo6my3"
}

Describe the results you expected:

docker inspect --format='{{json .Config.Labels}}' 83d | jq
{
  "com.docker.swarm.node.id": "blgkwwxupbn3ge22549g9dlf6",
  "com.docker.swarm.service.id": "5ug0rxrgja4cr2l5c8n0lfe1b",
  "com.docker.swarm.service.name": "deleteme",
  "com.docker.swarm.task": "",
  "com.docker.swarm.task.id": "4wnbf56ws85skqxdvz6ik3usg",
  "com.docker.swarm.task.name": "deleteme.2"
}

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      1.13.0-rc2
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   1f9b3ef
 Built:        Wed Nov 23 17:40:58 2016
 OS/Arch:      linux/amd64

Server:
 Version:             1.13.0-rc2
 API version:         1.25
 Minimum API version: 1.12
 Go version:          go1.7.3
 Git commit:          1f9b3ef
 Built:               Wed Nov 23 17:40:58 2016
 OS/Arch:             linux/amd64
 Experimental:        true

Output of docker info:

Client:
 Version:      1.13.0-rc2
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   1f9b3ef
 Built:        Wed Nov 23 17:40:58 2016
 OS/Arch:      linux/amd64

Server:
 Version:             1.13.0-rc2
 API version:         1.25
 Minimum API version: 1.12
 Go version:          go1.7.3
 Git commit:          1f9b3ef
 Built:               Wed Nov 23 17:40:58 2016
 OS/Arch:             linux/amd64
 Experimental:        true
root@swarm-staging-3:~# docker info
Containers: 16
 Running: 8
 Paused: 0
 Stopped: 8
Images: 7
Server Version: 1.13.0-rc2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 71
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local rexray
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: tbepajhjit041p6dke3ft4edu
 Is Manager: true
 ClusterID: 4nc3h0la0um5n0e69gnne7pit
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.137.145.29
 Manager Addresses:
  10.137.129.233:2377
  10.137.137.223:2377
  10.137.145.29:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.858 GiB
Name: swarm-staging-3
ID: 6MIN:CBWD:G46G:7DYL:FCKI:QPG2:DQF5:5AXS:2MZI:2F3N:Z2GD:ZRMG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
 provider=amazonec2
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

thaJeztah commented 7 years ago

ping @aluzzardi PTAL

nishanttotla commented 7 years ago

Might be related to #28945

thaJeztah commented 7 years ago

Looks like deleteme.2.wt04bs73a33xrf63fjulo6my3 is the name used for the container, so the task name, perhaps the behavior in 1.12 was actually incorrect?

bvis commented 7 years ago

I think the right format would be something like:

docker inspect --format='{{json .Config.Labels}}' 83d | jq
{
  "com.docker.swarm.node.id": "blgkwwxupbn3ge22549g9dlf6",
  "com.docker.swarm.service.id": "5ug0rxrgja4cr2l5c8n0lfe1b",
  "com.docker.swarm.service.name": "deleteme",
  "com.docker.swarm.task": "deleteme.2.4wnbf56ws85skqxdvz6ik3usg",
  "com.docker.swarm.task.id": "4wnbf56ws85skqxdvz6ik3usg",
  "com.docker.swarm.task.name": "deleteme.2"
}

Instead of the current value:

docker inspect --format='{{json .Config.Labels}}' 7cc | jq
{
  "com.docker.swarm.node.id": "t0nzfz9o3jre4d8uydjdyon5n",
  "com.docker.swarm.service.id": "umbzdq0bmcaii2vds174fz9i3",
  "com.docker.swarm.service.name": "deleteme",
  "com.docker.swarm.task": "",
  "com.docker.swarm.task.id": "wt04bs73a33xrf63fjulo6my3",
  "com.docker.swarm.task.name": "deleteme.2.wt04bs73a33xrf63fjulo6my3"
}

Does this have sense? It's consistent with the naming used for "service.name" and "service.id".

xinity commented 7 years ago

same issue here FYI :)

would love to see a better naming

vablergo commented 7 years ago

This issue persists in version 17.06

I guess I could use --container-label option when creating a service to work around this problem with a custom label, but does it support Go templating placeholders and how can I retrieve the "replicated task number" for it in case of a replicated service?

thaJeztah commented 7 years ago

Looking at this again, I think the current label is correct, but there is definitely some inconsistency in what the docker cli outputs.

"com.docker.swarm.task" is always empty ?

First of all, the mystery of the empty "com.docker.swarm.task" label. I looked at the code, and this is as intended; only the key for "com.docker.swarm.task" is of importance; the key itself is to indicate that a container is a "task" (thus managed by Docker); you can see the comment here explaining that; https://github.com/moby/moby/blob/8af4db6f002ac907b6ef8610b237879dfcaa5b7a/daemon/cluster/executor/container/container.go#L225

"com.docker.swarm.name" should be "myservice.1", not "myservice.1.q5dpzr5zejmu2395hr111v8qs" ?

Next, looking if the "com.docker.swarm.name" label should be just <servicename>.<slot-number> (e.g. myservice.1)?

Both services and tasks must have unique names; for service names, that name will be either a given name (docker service create --name), or one that is generated (when omitting the --name). For tasks, this is more complicated; depending on the number of replicas, a service has "X" (number of --replicas) "slots" (the .1 in the task's name); each slot can contain a single task at any time, but once a task is completed, another task takes its place (i.e. slots are reused). Because of that, the combination of just servicename and slot-number is not sufficient to create a unique name for a task.

Here's to illustrate that;

Create a service named myservice:

$ docker service create --name myservice nginx:alpine
druc7axep7sowes6de847vyu7

$ docker service ps myservice

ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
iqiwpe7nnin5        myservice.1         nginx:alpine        moby                Running             Running 8 seconds ago

Update the service (changing any property will do);

$ docker service update --publish-add 80:80 myservice

$ docker service ps myservice

ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
q5dpzr5zejmu        myservice.1         nginx:alpine        moby                Running             Running 4 seconds ago
iqiwpe7nnin5         \_ myservice.1     nginx:alpine        moby                Shutdown            Shutdown 6 seconds ago

Now there's two tasks myservice.1, one current task (desired state running) and one "completed" task (desired state shutdown). The presentation here is misleading; for brevity, only partial names of the tasks are shown: the actual task-names for the tasks listed above have the .<task-id> as suffix. Trying to inspect a task using just the name shown in the NAME column shows this:

$ docker inspect myservice.1
[]
Error: No such object: myservice.1

Appending the task's full ID (use docker service ps myservice --no-trunc to get the non-truncated ID's), both tasks can be found;

$ docker inspect myservice.1.q5dpzr5zejmu2395hr111v8qs --format '{{json .Config.Labels}}'
{"com.docker.swarm.node.id":"drfinwj2um6lv0pbprh0gzzfm","com.docker.swarm.service.id":"druc7axep7sowes6de847vyu7","com.docker.swarm.service.name":"myservice","com.docker.swarm.task":"","com.docker.swarm.task.id":"q5dpzr5zejmu2395hr111v8qs","com.docker.swarm.task.name":"myservice.1.q5dpzr5zejmu2395hr111v8qs"}

$ docker inspect myservice.1.iqiwpe7nnin5xo19n48kkjb32 --format '{{json .Config.Labels}}'
{"com.docker.swarm.node.id":"drfinwj2um6lv0pbprh0gzzfm","com.docker.swarm.service.id":"druc7axep7sowes6de847vyu7","com.docker.swarm.service.name":"myservice","com.docker.swarm.task":"","com.docker.swarm.task.id":"iqiwpe7nnin5xo19n48kkjb32","com.docker.swarm.task.name":"myservice.1.iqiwpe7nnin5xo19n48kkjb32"}

Proposed changes

Now; what (IMO) should be done:

  1. a new label should be added that contains the task's slot number; this allows users to get all the name parts of a task (servicename, slot-number, and task-id) separately;
    • com.docker.swarm.service.name
    • com.docker.swarm.task.slot
    • com.docker.swarm.task.id
  2. we should consider printing the full task name if --no-trunc is used on docker service ps. The output would then look like;

    $ docker service ps myservice --no-trunc
    
    ID                          NAME                                          IMAGE                                                                                  NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
    q5dpzr5zejmu2395hr111v8qs   myservice.1.q5dpzr5zejmu2395hr111v8qs         nginx:alpine@sha256:24a27241f0450b465f9e9deb30628c524aa81a1aa6936daa41ef7c4345515272   moby                Running             Running about an hour ago
    iqiwpe7nnin5xo19n48kkjb32    \_ myservice.1.q5dpzr5zejmu2395hr111v8qs     nginx:alpine@sha256:24a27241f0450b465f9e9deb30628c524aa81a1aa6936daa41ef7c4345515272   moby                Shutdown            Shutdown about an hour ago

I'm opening a pull request to discuss 1.. The second change may need some discussion as it could potentially break users, but if someone is interested, feel free to open a pull request to start the discussion on that one as well :+1: (I may do so myself if I find time)

thaJeztah commented 7 years ago

Opened a pull request for the com.docker.swarm.task.slot label; https://github.com/moby/moby/pull/34535

ugurarpaci commented 6 years ago

Is this merged?

NicholasNoise commented 4 years ago

Active?

webchi commented 4 years ago

Still need it at 2020