Service is not DNS resolvable from another one if containers run on different nodes

vasily-kirichenko commented 7 years ago

I have two services running a single container each, on different nodes, using same "overlay" network. When I try to ping one container from inside the other via service name, it fails:

ping akka-test
ping: bad address 'akka-test'

After I scaled the akka-test service so that a container runs on the node where the other container is running, everything suddenly starts to work.

So my questing is: is my assumption valid that services should be discoverable across entire Swarm? I mean, name of a service should be DNS resolvable from any other container in this Swarm, no matter where containers are running.

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
255fedab2fc4        bridge              bridge              local
9a450f033c48        docker_gwbridge     bridge              local
6e76844033f8        host                host                local
dzwgdein8cxa        ingress             overlay             swarm
54uqc60vx1i5        net2                overlay             swarm
d632a42ef140        none                null                local

$ docker service ls
ID            NAME         REPLICAS  IMAGE                             COMMAND
0wyv4gq14mnu  akka-test    8/8       xxxx:5000/akkahttp1:1.20
cg7r4ius7xfm  akka-test-2  1/1       xxxx:5000/akkahttp1:1.20

$ docker service inspect --pretty akka-test
ID:             0wyv4gq14mnuj8kfolizh1h23
Name:           akka-test
Mode:           Replicated
 Replicas:      8
Placement:
UpdateConfig:
 Parallelism:   1
 On failure:    pause
ContainerSpec:
 Image:         xxxx:5000/akkahttp1:1.20
Resources:
Networks: 54uqc60vx1i57d3qnmhza82c4

$ docker service inspect --pretty akka-test-2
ID:             cg7r4ius7xfmgvazmptvarn2k
Name:           akka-test-2
Mode:           Replicated
 Replicas:      1
Placement:
UpdateConfig:
 Parallelism:   1
 On failure:    pause
ContainerSpec:
 Image:         xxxx:5000/akkahttp1:1.20
Resources:
Networks: 54uqc60vx1i57d3qnmhza82c4

$ docker info
Containers: 75
 Running: 11
 Paused: 0
 Stopped: 64
Images: 42
Server Version: 1.12.1-rc1
Storage Driver: devicemapper
 Pool Name: docker-253:0-135409124-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 8.291 GB
 Data Space Total: 107.4 GB
 Data Space Available: 40.86 GB
 Metadata Space Used: 19.61 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.128 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null overlay host bridge
Swarm: active
 NodeID: ao1wz862t6n4yog4hpi4yqm20
 Is Manager: true
 ClusterID: 3hpbbe2jtdoqe1zvxs41cycoq
 Managers: 3
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: xxxx
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 188.6 GiB
Name: xxxx
ID: OWEH:OIIR:7NZ6:IKZV:RFJ4:NXAZ:NH7H:WPLC:D457:DKGN:CH2C:E2UE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

kaii-zen commented 7 years ago

I'm seeing this too. I'm using Docker for AWS and this has happened both on beta4 and now on beta5. Service names are sometimes unresolvable, sometimes resolvable but no route to host. It also works sometimes. I've been so far unable to reliably reproduce from scratch.