moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.31k stars 609 forks source link

Service is not DNS resolvable from another one if containers run on different nodes #1429

Open vasily-kirichenko opened 7 years ago

vasily-kirichenko commented 7 years ago

I have two services running a single container each, on different nodes, using same "overlay" network. When I try to ping one container from inside the other via service name, it fails:

ping akka-test
ping: bad address 'akka-test'

After I scaled the akka-test service so that a container runs on the node where the other container is running, everything suddenly starts to work.

So my questing is: is my assumption valid that services should be discoverable across entire Swarm? I mean, name of a service should be DNS resolvable from any other container in this Swarm, no matter where containers are running.

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
255fedab2fc4        bridge              bridge              local
9a450f033c48        docker_gwbridge     bridge              local
6e76844033f8        host                host                local
dzwgdein8cxa        ingress             overlay             swarm
54uqc60vx1i5        net2                overlay             swarm
d632a42ef140        none                null                local
$ docker service ls
ID            NAME         REPLICAS  IMAGE                             COMMAND
0wyv4gq14mnu  akka-test    8/8       xxxx:5000/akkahttp1:1.20
cg7r4ius7xfm  akka-test-2  1/1       xxxx:5000/akkahttp1:1.20
$ docker service inspect --pretty akka-test
ID:             0wyv4gq14mnuj8kfolizh1h23
Name:           akka-test
Mode:           Replicated
 Replicas:      8
Placement:
UpdateConfig:
 Parallelism:   1
 On failure:    pause
ContainerSpec:
 Image:         xxxx:5000/akkahttp1:1.20
Resources:
Networks: 54uqc60vx1i57d3qnmhza82c4
$ docker service inspect --pretty akka-test-2
ID:             cg7r4ius7xfmgvazmptvarn2k
Name:           akka-test-2
Mode:           Replicated
 Replicas:      1
Placement:
UpdateConfig:
 Parallelism:   1
 On failure:    pause
ContainerSpec:
 Image:         xxxx:5000/akkahttp1:1.20
Resources:
Networks: 54uqc60vx1i57d3qnmhza82c4
$ docker info
Containers: 75
 Running: 11
 Paused: 0
 Stopped: 64
Images: 42
Server Version: 1.12.1-rc1
Storage Driver: devicemapper
 Pool Name: docker-253:0-135409124-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 8.291 GB
 Data Space Total: 107.4 GB
 Data Space Available: 40.86 GB
 Metadata Space Used: 19.61 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.128 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null overlay host bridge
Swarm: active
 NodeID: ao1wz862t6n4yog4hpi4yqm20
 Is Manager: true
 ClusterID: 3hpbbe2jtdoqe1zvxs41cycoq
 Managers: 3
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: xxxx
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 188.6 GiB
Name: xxxx
ID: OWEH:OIIR:7NZ6:IKZV:RFJ4:NXAZ:NH7H:WPLC:D457:DKGN:CH2C:E2UE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8
kaii-zen commented 7 years ago

I'm seeing this too. I'm using Docker for AWS and this has happened both on beta4 and now on beta5. Service names are sometimes unresolvable, sometimes resolvable but no route to host. It also works sometimes. I've been so far unable to reliably reproduce from scratch.

dperny commented 7 years ago

Because of some networking limitations (I think related to virtual IPs), the ping tool will not work with overlay networking. Are you service names resolvable with other tools like dig?

Take a look at this guide, if you haven't already: https://docs.docker.com/engine/swarm/networking/

vasily-kirichenko commented 7 years ago

@dperny Thanks, will check with dig.

dperny commented 7 years ago

Sure. Let me know whether or not that fixes the issue, so I can know to close the issue or take a deeper look.

vasily-kirichenko commented 7 years ago

I could not find a docker image with dig installed, so I tested with nslookup. It could not resolve service if container was running on a different node.

dperny commented 7 years ago

Can you give some more information for reproducing? I tried to reproduce by creating a 3 node cluster with 1 manager.

# create new network
$ docker network create --driver overlay net
# create web service
$ docker service create --network net --name web nginx
# web landed on node-2
# create busybox service for lookups
$ docker service create --network net --name probe busybox sleep 3000
# probe landed on node-3
# now, from node 3
$ docker exec -it <busybox container id> /bin/sh

/ # nslookup web
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      web
Address 1: 10.0.0.2
/ # nslookup probe
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      probe
Address 1: 10.0.0.4
/ # nslookup butterpecans
Server: 127.0.0.11
Address 1: 127.0.0.11

nslookup: can't resolve 'butterpecans'

So this appears to work for me.

dperny commented 7 years ago

Do you have TCP port 7946 open on your hosts? Gossip needs that port open for networking to work correctly.

Ayiga commented 7 years ago

@dperny create your services without vip endpoint mode. It occurs specifically with dnsrr for certain. However, it may work with any mode that doesn't generate a proxy address.

dperny commented 7 years ago

@ayiga Just tried the above steps but added --endpoint-mode dnsrr and it resolves properly. Is your failure intermittent, or consistent?

Ayiga commented 7 years ago

It's consistent. From my experience the DNS resolution is only capable of resolving containers that exist on the same Node. The Manager is capable of resolving containers throughout the swarm. (Sometimes it doesn't, I'm not sure for the cause of it). But this issue is primarily with Worker Nodes.

I did a full write up of my steps in the post: https://forums.docker.com/t/container-dns-resolution-in-ingress-overlay-network/21399 for Docker for AWS. However, the issue is easily reproducible from my personal setup, between a Linux Box (Ubuntu variant), and my Mac using Docker for Mac.

c4wrd commented 7 years ago

I am also experiencing this issue, with my setup as follows. I have three AWS EC2 nodes, all on a private shared network where they can communicate on all ports (I have verified all nodes can reach all other nodes on the ports specified in the Swarm 1.12 documentation). I create containers on a shared overlay network (verified the overlay interface exists and correctly is routed through the specified subnet), and only when two containers are on the same node can they communicate via their VIP or hostname. When containers are on different nodes, I will receive a "no route to host" message when attempting to connect to each other.

c4wrd commented 7 years ago

@Ayiga @vasily-kirichenko I actually just resolved this by changing the subnet of my overlay network. Previously it had been on 172.0.0.0/24, but for some reason I believe this is conflicting with the docker networking interfaces (even though it doesn't). Now I can resolve containers on other nodes by hostname and VIP without issue. Here's how I created the network for reference:

docker network create \
         --driver overlay \
         --subnet 10.10.9.0/24 \
        selenium-grid
glorious-beard commented 7 years ago

Is there any further resolution on this?

I'm running an AWS 4 node (3 manager, 1 worker) swarm, all under Docker 1.13.1. I'm using Docker Compose with an external network overlay network created in attachable mode, using a subnet different from the host, for all the services in the docker compose network and deploying it with docker stack deploy --compose-file.

Even if I add another 3 nodes as dedicated docker managers with availability set to drained, and everything else set worker mode, I still encounter services that cannot access other services over the overlay network. All the services are defined in the compose file.

Attempting to resolve a service name to an IP address via dig or nslookup (using 127.0.0.11 as the DNS server) results in no records for other tasks running on that overlay network.

Docker Info

Containers: 5
 Running: 1
 Paused: 0
 Stopped: 4
Images: 10
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 89
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: fk3f2buol2b6azvqap8pdhzup
 Is Manager: false
 Node Address: 11.0.12.39
 Manager Addresses:
  11.0.10.7:2377
  11.0.11.18:2377
  11.0.12.45:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 59.97 GiB
Name: ip-11-0-12-39
ID: 3XUZ:DCGO:F474:GNKB:2VN6:ZJYE:LPWJ:SPOS:HGR3:UJVX:RATM:TMRT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
nathanleclaire commented 7 years ago

@zen-chetan Try @c4wrd's suggestion to use a different subnet IP to see if that resolves the issue.

I've seen an issue like this before and it was because AWS nodes had /etc/resolv.conf that pointed to a 10.0.0.x IP address in the VPC subnet (common), but Docker DNS was getting confused because the subnets of the created overlay(s) would also be in that range.

I'd argue that maybe the default subnet for overlay networks should be changed as it overlaps with a very common internal IP subnet. e.g., the getting started with Amazon VPC guide uses 10.0.0.0/24.

image

At very least this should probably be covered in Docker documentation.

nathanleclaire commented 7 years ago

@sanimej @aboch I'm curious your thoughts on the above ^^

glorious-beard commented 7 years ago

Thanks @nathanleclaire for your suggestion. However, I am running different subnets for the AWS hosts and the overlay network.

The hosts are running in the subnet 11.0.0.0. Here's the output of /etc/resolve.conf for one of the hosts that can't resolve DNS for containers running on it.

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 11.0.0.2
search us-west-2.compute.internal

The docker overlay network runs with the subnet 10.0.10.0/24. docker network inspect output...

[
    {
        "Name": "brain_net",
        "Id": "ip81kx5shqsenzsalo04oxpzk",
        "Created": "2017-02-17T01:34:40.16419404Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.10.0/24",
                    "Gateway": "10.0.10.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Containers": {
            "86473d760c0ab112adec455c8b65213734d35c8c26f1db0719d40a1f6fd6f61a": {
                "Name": "alpha_gpu_engineer.fk3f2buol2b6azvqap8pdhzup.xatrh8z4dmxncqztiz9i85ls5",
                "EndpointID": "5af3262a75990cda4f6354aa194b57b47f2f888f8e507f6bbfa5e609e2f7490c",
                "MacAddress": "02:42:0a:00:0a:0a",
                "IPv4Address": "10.0.10.10/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "ip-11-0-12-39-aab579cb2196",
                "IP": "11.0.12.39"
            }
        ]
    }
]

Here's the /etc/resolv.conf for one of the containers in the host:

search us-west-2.compute.internal
nameserver 127.0.0.11
options ndots:0

To rule out AWS SGs, I've also completely opened all ports for both UDP and TCP, incoming and outgoing, for the security group all of the nodes run in.

aboch commented 7 years ago

@vasily-kirichenko

So my questing is: is my assumption valid that services should be discoverable across entire Swarm? I mean, name of a service should be DNS resolvable from any other container in this Swarm, no matter where containers are running.

Yes, they will be discoverable by any container no matter where it is running as long as it is connected to the same network.

@zen-chetan

Here's the output of /etc/resolve.conf for one of the hosts that can't resolve DNS for containers running on it.

Not sure if you meant that, but if you were expecting to be able to resolve the service name from the host, that is not possible. The service name is only discoverable from inside the swarm networks the service is attached to.

For the rest, I only have some generic comments:

As @dperny suggested, in order for the network control plane info (like the internal dns records) to spread in the cluster, please make sure both tcp/7946 and udp/7946 are open on each and every node and security group rules allow them.

You system will be subject to the overlay/host subnet conflict, as @nathanleclaire was mentioning, if you see vx-<ID> named interfaces in your hosts where a container is running on an overlay network. If no vx-<ID> named interfaces are there, then your overlay network subnet can overlap with the hosts VPC subnet.

When things do not work with stack deploy, try to create the docker services manually to see if the problem is or not specific to docker stack.

glorious-beard commented 7 years ago

Thank you @aboch. My intention in showing the host's /etc/resolv.conf was to demonstrate that the host and the name server it uses does not seem to overlap the docker overlay network's subnet.

Regarding vx-<ID> interfaces, I see a lot of veth* interfaces created when everything is running, with different numbers of interfaces on each host in the node, ranging from as little one to as much as 13. These interfaces are present both on the host and in a container started with the docker stack deploy command. How do I check for these vx-<ID> interfaces?

aboch commented 7 years ago

@zen-chetan

My intention in showing the host's /etc/resolv.conf was to demonstrate that the host and the name server it uses does not seem to overlap the docker overlay network's subnet.

Ah I see, thanks. But given swarm networks are global scope networks, the overlap check is not run for their subnets. This is why the issue could arise with kernels which do not support creating the vxlan interface in a separate netns. Libnetwork detects if the kernel supports that feature. If it does not, then it creates the vxlan interfaces (one per subnet per overlay network) in the host namespace with names vx-....

How do I check for these vx- interfaces?

If you do not see any of those in the ip link o/p, then it means you do not need to worry about which subnet was chosen for the overlay network. Just make sure this is true for all the hosts the overlay network spans.

I see a lot of veth* interfaces created when everything is running,

Yes those are the interfaces connecting each container on the overlay network with the default_gwbridge network, to provide outside world connection to the cotainers.

glorious-beard commented 7 years ago

So I'm still stumped by this...

Given three AWS nodes running in a private VPC subnet with the security group set to allow all traffic in and out on all ports, both UDP, and TCP on the subnet 11.0.0.0/8, I still cannot get obtain the IP address of services running on other nodes in the docker swarm. Any services running on a node in the swarm can get IP address for services running on the same node.

How to reproduce... 1 - Create an attachable network (docker-compose version 3 files still don't support attachable overlay networks)

docker network create --driver overlay --attachable --subnet 192.168.1.0/24 alpha_net

2 - Start the following docker-compose file with docker stack deploy --compose-file=docker-compose.yml alpha This is a stripped down sample that creates a consul cluster. I've left out some of the other services from the compose file.

version: "3.1"

services:

  # Consul server 1 of 3
  consul1:
    image: consul:0.7.5
    command: agent -bind=0.0.0.0 -client=0.0.0.0 -advertise='{{ GetAllInterfaces | include "network" "192.168.1.0/24" | attr "address" }}' -log-level=INFO -node=config-server-1 -server -bootstrap-expect=3 -rejoin -retry-join=consul1 -retry-join=consul2 -retry-join=consul3
    environment:
      SERVICE_8500_IGNORE: "true"
      SERVICE_8300_IGNORE: "true"
      SERVICE_8301_IGNORE: "true"
      SERVICE_8302_IGNORE: "true"
      SERVICE_8400_IGNORE: "true"
      SERVICE_8600_IGNORE: "true"
    networks:
      - alpha_net
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - 'node.labels.cpu == enabled'

  # Consul server 2 of 3
  consul2:
    image: consul:0.7.5
    command: agent -bind=0.0.0.0 -client=0.0.0.0 -advertise='{{ GetAllInterfaces | include "network" "192.168.1.0/24" | attr "address" }}' -log-level=INFO -node=config-server-2 -server -bootstrap-expect=3 -rejoin -retry-join=consul1 -retry-join=consul2 -retry-join=consul3
    environment:
      SERVICE_8500_IGNORE: "true"
      SERVICE_8300_IGNORE: "true"
      SERVICE_8301_IGNORE: "true"
      SERVICE_8302_IGNORE: "true"
      SERVICE_8400_IGNORE: "true"
      SERVICE_8600_IGNORE: "true"
    networks:
      - alpha_net
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - 'node.labels.cpu == enabled'

  # Consul server 3 of 3
  consul3:
    image: consul:0.7.5
    command: agent -bind=0.0.0.0 -client=0.0.0.0 -advertise='{{ GetAllInterfaces | include "network" "192.168.1.0/24" | attr "address" }}' -log-level=INFO -node=config-server-3 -server -bootstrap-expect=3 -rejoin -retry-join=consul1 -retry-join=consul2 -retry-join=consul3
    environment:
      SERVICE_8500_IGNORE: "true"
      SERVICE_8300_IGNORE: "true"
      SERVICE_8301_IGNORE: "true"
      SERVICE_8302_IGNORE: "true"
      SERVICE_8400_IGNORE: "true"
      SERVICE_8600_IGNORE: "true"
    networks:
      - alpha_net
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - 'node.labels.cpu == enabled'
networks:
  alpha_net:
    external: true

The above fails, since container running consul1 cannot resolve consul2 and consul3 into IP addresses.

    2017/02/22 01:58:58 [INFO] agent: (LAN) joining: [consul1 consul2 consul3]
    2017/02/22 01:58:58 [WARN] memberlist: Failed to resolve consul2: lookup consul2 on 127.0.0.11:53: no such host
    2017/02/22 01:58:58 [WARN] memberlist: Failed to resolve consul3: lookup consul3 on 127.0.0.11:53: no such host
    2017/02/22 01:58:58 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2017/02/22 01:58:58 [INFO] agent: Join completed. Synced with 1 initial agents
    2017/02/22 01:59:05 [ERR] agent: failed to sync remote state: No cluster leader

And, if I manually attach to the container for consul1 with docker exec -it <container_id> /bin/sh, I can nslookup services running on the same node, but not services running on a different node.

/ # nslookup consul1
Name:      consul1
Address 1: 192.168.1.31 ip-192-168-1-31.us-west-2.compute.internal
/ # nslookup consul2
nslookup: can't resolve 'consul2': Name does not resolve
/ # nslookup consul3
nslookup: can't resolve 'consul3': Name does not resolve
/ # nslookup docdb
nslookup: can't resolve 'docdb': Name does not resolve
/ # nslookup userdb
Name:      userdb
Address 1: 192.168.1.37 ip-192-168-1-37.us-west-2.compute.internal

(userdb in the list above is another service in the compose file... left out for brevity's sake)

I can reach the name server at 127.0.0.11 just fine inside the container for consul1, but it seems as if IP addresses for services running on other nodes aren't getting synchronized in the swarm network.

glorious-beard commented 7 years ago

One more data point... creating the above services in the docker-compose file manually with service create --name XXX calls does permit cross-node DNS IP resolution..

If I manually create the services with service create --name X --network alpha_net, I sometimes (not consistently) see the same behavior.

augmento commented 6 years ago

I see the same issue on aws. Anyone has any recommendation or a workaround?

vguna commented 6 years ago

I'm seeing similar results. Also without the AWS stuff. For me I only have a master node (at a hoster) and one worker node (home linux box with static ip). There are 7 containers distributed between them. I've checked the swarm port 7946 TCP and it is reachable on the hoster and at my linux box using the external host IPs. Distribution works as expected, but the containers on the linux box can't lookup the names at the hoster containers but the other way around. If I inspect the nodes and try to ping the IPs instead of the names within the containers, it doesn't work as well. Funny thing is, that the containers on each node can ping the other containers on the same node. I'm not using any special/additional network but just deploying the stack via: docker stack deploy -c docker-compose.yml

I've read not to use ping, but ping works on the same node. But I also tried nslookup without luck.

The two nodes are running Ubuntu 16.04. as a host OS with the latest (17.06.2-ce) docker version. The nodes are running 4.4.0-93-generic and 4.4.0-87-generic Ubuntu kernels.

I'm a bit lost as the other guys, where to look further.

rdxmb commented 6 years ago

@zen-chetan what if you do not create the network before running docker stack deploy ? In my production-stack.yml there is no defining of networks, whether within services nor in the top-level-definition. When deploying the stack, an overlay network will be created by docker:

root@docker1:/data/monitoring# cat network.yml 
version: '3.3'

services:

  influxdb:
    image: influxdb
    hostname: monitoring-influxdb
    volumes:
      - /data/monitoring/data/influxdb/var-lib-influxdb:/var/lib/influxdb
      - /etc/localtime:/etc/localtime:ro

  telegraf:
    image: telegraf
root@docker1:/data/monitoring# docker stack deploy -c network.yml networking
Creating network networking_default
Creating service networking_telegraf
Creating service networking_influxdb
root@docker1:/data/monitoring# docker network ls | grep networking_default
k18skhvdiwgh        networking_default    overlay             swarm

//edited

# docker --version
Docker version 17.07.0-ce, build 8784753
glorious-beard commented 6 years ago

Our product requires an attachable overlay network, which isn't supported in the docker compose yml file, AFAICT.

On Sep 13, 2017, at 1:46 AM, rdxmb notifications@github.com<mailto:notifications@github.com> wrote:

@zen-chetanhttps://github.com/zen-chetan what if you do not create the network before running docker stack deploy ? In my production-stack.yml there is no defining of networks, whether within services nor in the top-level-definition. When deploying the stack, an overlay network will be created by docker:

root@docker1:/data/monitoring# cat network.yml version: '3.3'

services:

influxdb: image: influxdb hostname: monitoring-influxdb volumes:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/docker/swarmkit/issues/1429#issuecomment-329100815, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQnKh9X_5cOajAzq5aLDEOPtWHUQ0CWFks5sh5ZogaJpZM4Jr-UD.

augmento commented 6 years ago

After I made sure that all the 3 hosts had docker-ce installed, with swarm, and using docker service create to launch containers, I was able to reach the containers across hosts. Ping using the container name also worked across hosts. I am not using docker stack deploy. I created the overlay network and used the same network name when launching the containers using service create. I still need to resolve a few issues related to making certain services talk to each other (which may be related to publishing ports etc) but I think I have crossed the hurdle I faced with cross container communication which I think was due to docker version mismatch across hosts.

vguna commented 6 years ago

I got it working now. Here are some insights that may help others:

My main fault was using Windows and not specifying --advertise-addr - since I thought the IP address of the master was already specified correctly by the generated join token cmd. But it's important to specify the worker node IP as well on join!

I hope that helps someone. Most of the stuff is mentioned in the documentation and here in the comments. But only the combination of the mentioned points worked for me.

BTW: I've tested this with docker-compose v3.3 syntax and deployed it via docker stack deploy with the default overlay network. As a kernel I used the Ubuntu 16.04 LTS, 4.4.0-93-generic kernel.

rrtaylor commented 6 years ago

Having similar trouble connecting services between hosts. If I am in a container in a worker node and use netcat -vz to try to connect to the manager node host and port, I get the following error:

root@adc78cf2c38d:/# netcat -vz cordoba.<company>.com 8786 
DNS fwd/rev mismatch: cordoba.<company>.com != <ip-address>-static.hfc.comcastbusiness.net 
cordoba.company.com [<ip-address>] 8786 (?) open

Values with <> around them are to anonymize the output. cordoba.<company>.com is the manager node host. Are there some external network changes that I need to make to get swarm to work?

vguna commented 6 years ago

The netcat was meant for testing the open ports on master and worker hosts, not the containers. I haven't tried whether it is also accessible from inside the containers. I didn't have to change or specify any network settings at all. Default worked fine for me (via docker-compose).

BTW: what is port 8786? What OSes are you using?

rubidot commented 6 years ago

I ran into this issue, and got it working with @vguna's tips. In particular, I had to set the --advertise-addr on my worker node to the external IP.

My concern is, while my manager node as a fixed IP, my worker nodes have dynamic IPs. According to the docs, this should be fine, and I've confirmed that the manager has no problem switching to the the new node IP when it changes. So, when the worker IP changes, the manager will still see the node as healthy and assign tasks to it, but the advertise address will be the old address, so those containers will be unreachable from other nodes.

cima commented 6 years ago

--advertise-addr was the silver bullet for us. Documentation says you can use this switch with NIC's name like --advertise-addr eth0:2377 where eth0 is address independent and fits your requirement of nodes with dynamic IP address. Same as we have.

See --advertise-addr value

rubidot commented 6 years ago

Thanks @cima, I think this is the right direction.

When I do ifconfig -a on my worker node, I don't see any interfaces using my external IP address, just the internal one. The worker is behind a router, which forwards the necessary ports to the worker server. In this case, would my server even have a network interface associated with the external IP?

cima commented 6 years ago

Externality in this case is not meant to the Internet but to dockers virtual networks. The key is to have direct connectivity (no NAT on the way) from worker node to manager node and vice versa. The goal is that master knows IP address of each worker that joined him. Due to fun side effects of multiple NICs being present at worker and some lazy implementation of master the IP address must be told by worker explicitly in the joining request. But poor worker is unable to determine which of those many NICs is the one that provide direct connectivity to manager node. By using --advertise-addr eth0 you are giving worker a hint that NIC eth0 is connected to the same network as the manager.

So look at manager by ip addr show as well as on worker and you'll see that some ethx network controllers have the same network prefix.

RyanGough commented 5 years ago

Bumped into this problem today and was only able to get around it by adding -dns-option use-vc when creating services (which I found out about here: https://forums.docker.com/t/dns-resolution-not-working-in-containers/36246/2)

melaurent commented 4 years ago

Thanks @cima, I think this is the right direction.

When I do ifconfig -a on my worker node, I don't see any interfaces using my external IP address, just the internal one. The worker is behind a router, which forwards the necessary ports to the worker server. In this case, would my server even have a network interface associated with the external IP?

@rubidot Hello, I have the same problem. When the IP address of my worker node changes, services stop being DNS resolvable. Did you find a way to advertise the new dynamic ip when it changes ? I am too unable to get the public ip from an interface

Docjones commented 4 years ago

I have the exact same problem as described by the OP: Starting services in a swarm and trying to access them from different nodes container via overlay network using their name (set via docker service create ... --network name=...,alias="test-{{.Node.Hostname}}") does not work.

I found out (using docker run -d --name dns -v /var/run/docker.sock:/docker.sock phensley/docker-dns) that only the names of the service local to the manager are being added to docker dns:

2019-10-17T11:32:00.627802 [dockerdns] table.add test.3.f6k6q9pv5zrmq4974h4pt7jyo.docker -> 10.0.0.190
2019-10-17T11:32:00.627921 [dockerdns] table.add 190.0.0.10.in-addr.arpa -> test.3.f6k6q9pv5zrmq4974h4pt7jyo.docker
2019-10-17T11:32:00.627978 [dockerdns] table.add test-DFIDS020.docker -> 10.0.0.190
2019-10-17T11:32:00.628042 [dockerdns] table.add 190.0.0.10.in-addr.arpa -> test-DFIDS020.docker

(the service was created with --replicas 3 on 1 manager + 2 workers)

I recreated swarm worker nodes (with --advertise-addr), and checked everything from above, but could not fix that issue. I'd say it's not working as expected... Any help would be appreciated

Jack-Ji commented 3 years ago

I got it working now. Here are some insights that may help others:

  • Don't try to use docker for Windows to get multi-node mesh network (swarm) running. It's simply not (yet) supported. If you google around, you find some Microsoft blogs telling about it. Also the docker documentation mentions it somewhere. It would be nice, if docker cmd itself would print an error/warning when trying to set something up under Windows - which simply doesn't work. It does work on a single node though.
  • Don't try to use a Linux in a Virtualbox under Windows and hoping to workaround with it. It, of course, doesn't work since it has the same limitations as the underlying Windows.
  • Make sure you open ports at least 7946 tcp/udp and 4789 udp for worker nodes. For master also 2377 tcp. Use e.g. netcat -vz -u for udp check. Without -u for tcp.
  • Make sure to pass --advertise-addr on the docker worker node (!) when executing the join swarm command. Here put the external IP address of the worker node which has the mentioned ports open. Doublecheck that the ports are really open!
  • Using ping to check the DNS resolution for container names works. If you forget the --advertise-addr or not opening port 7946 results in DNS resolution not working on worker nodes!

My main fault was using Windows and not specifying --advertise-addr - since I thought the IP address of the master was already specified correctly by the generated join token cmd. But it's important to specify the worker node IP as well on join!

I hope that helps someone. Most of the stuff is mentioned in the documentation and here in the comments. But only the combination of the mentioned points worked for me.

BTW: I've tested this with docker-compose v3.3 syntax and deployed it via docker stack deploy with the default overlay network. As a kernel I used the Ubuntu 16.04 LTS, 4.4.0-93-generic kernel.

Almost googled my ass off to finally find this valuable suggestion. This windows platform issue really bugs me.

prawen commented 3 years ago

I'm facing same issue on AWS EC2 [Ubuntu Server 16.04]. One master and 2 workers. This is my docker_gwbridge network info

[
    {
        "Name": "docker_gwbridge",
        "Id": "c17f0cac35357440499541bb356b8be9339f171066c7de97a15260e8a7b3e001",
        "Created": "2020-09-22T13:37:44.944689723Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "10.11.0.0/16"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "450f1d8144baff2788b7e73149ab7fcb573fedb6e33d9be505af2bd062549f4a": {
                "Name": "gateway_642e67b0cea2",
                "EndpointID": "3dc0fabc85323d3d4e38571f6b5a121c69e566db9e054deae5571d1e811b622f",
                "MacAddress": "02:42:0a:0b:00:03",
                "IPv4Address": "10.11.0.3/16",
                "IPv6Address": ""
            },
            "ingress-sbox": {
                "Name": "gateway_ingress-sbox",
                "EndpointID": "b3ad96e1cea391476ff9a0d386d0276313e79513a3b5b436abfd2dea053d1434",
                "MacAddress": "02:42:0a:0b:00:02",
                "IPv4Address": "10.11.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.enable_icc": "false",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.name": "docker_gwbridge"
        },
        "Labels": {}
    }
]

I deployed 3 node zookeeper with custom network

networks:
  one:
    driver: overlay
    ipam:
      driver: default
      config:
        - subnet: 192.168.2.0/24

When I deploy, zookeeper got deployed on 3 nodes. Service name is zookeeper, zookeeper1 and zookeeper2. I logged into each container and executed ip addr command to get the IP. Below is the info.

zookeeper@zookeeper:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
525: eth0@if526: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:c0:a8:02:03 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.3/24 brd 192.168.2.255 scope global eth0
       valid_lft forever preferred_lft forever
527: eth1@if528: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:0a:0b:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.11.0.3/16 brd 10.11.255.255 scope global eth1
       valid_lft forever preferred_lft forever
zookeeper@zookeeper1:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
138: eth0@if139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:c0:a8:02:06 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.6/24 brd 192.168.2.255 scope global eth0
       valid_lft forever preferred_lft forever
140: eth1@if141: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:0a:0b:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.11.0.3/16 brd 10.11.255.255 scope global eth1
       valid_lft forever preferred_lft forever
zookeeper@zookeeper2:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
94: eth0@if95: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:c0:a8:02:09 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.9/24 brd 192.168.2.255 scope global eth0
       valid_lft forever preferred_lft forever
96: eth1@if97: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:0a:0b:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.11.0.3/16 brd 10.11.255.255 scope global eth1
       valid_lft forever preferred_lft forever

Now, when I'm pinging zookeeper1 from zookeeper container, it is resolving to a different IP.

zookeeper@zookeeper:~$ ping zookeeper1
PING zookeeper1 (192.168.2.5) 56(84) bytes of data.
64 bytes from ip-192-168-2-5.us-west-2.compute.internal (192.168.2.5): icmp_seq=1 ttl=64 time=0.085 ms
64 bytes from ip-192-168-2-5.us-west-2.compute.internal (192.168.2.5): icmp_seq=2 ttl=64 time=0.088 ms
64 bytes from ip-192-168-2-5.us-west-2.compute.internal (192.168.2.5): icmp_seq=3 ttl=64 time=0.084 ms

But the ip of zookeeper1 is 192.168.2.6

/etc/resolv.conf inside container

search us-west-2.compute.internal
nameserver 127.0.0.11
options ndots:0
raspy commented 3 years ago

But the ip of zookeeper1 is 192.168.2.6

I believe that 192.168.2.5 is service virtual IP, while 192.168.2.6 is an actual container implementing the service. Should you have more than one replica of service, DNS would still resolve to .5, while the network would distribute the load to .6 and .7 (or whatever IPs would be assigned to those replicas).

kavishgr commented 3 years ago

Found a temporary solution after hours of troubleshooting.

My setup: Ubuntu as a manager and CentOS as a worker.

On each node, allow the following ports(found here):

2377/tcp
7946/tcp
7946/udp
4789/udp

Then add the following line in /etc/sysconfig/docker for CentOS and /etc/default/docker for Ubuntu(found here):

OPTIONS="--dns=10.10.0.1 --dns-search=example.com --dns-opt=use-vc"

Replace the IP with the gateway IP of your overlay network subnet. If you have multiple subnets, you can add more --dns=IP.

/etc/sysconfig/docker was not available by default on my machine. I just created one.

brandonwsims commented 3 years ago

Looking at the same issue in my setup, but I'm noticing something odd. I have 1 manager and 8 worker nodes. 5 of my 8 worker nodes fail to resolve the service name over the overlay network. The other 3 have no issue in doing so. No matter what service I launch or how I launch it, so long as it's connected to the correct overlay network, the same 3 have no issue resolving by service name.

I have absolutely no idea why the other 5 nodes in my swarm continue to have problems. I've tried the quick fixes listed in this thread to no avail. Each of my worker nodes are identical copies of each other.

prashant-shahi commented 2 years ago

In my case, it turned out to be related to ports. It worked after following ports were accessible between nodes:

Reference: https://docs.docker.com/engine/swarm/swarm-tutorial/#open-protocols-and-ports-between-the-hosts

bdoublet91 commented 2 years ago

Hi, Could you confirm that:

Externality in this case is not meant to the Internet but to dockers virtual networks. The key is to have direct connectivity (no NAT on the way) from worker node to manager node and vice versa

I have two cloud provider with swarm worker on both. I got the same problem of communication between container in the same overlay network.

I have setup a VPN between both LAN (wireguard) with static routes and iptables POSTROUTING

First LAN - 10.70.0.0/16 -> 10.90.0.1/16 VPN -> 10.90.0.2/16 -> 10.80.0.0/16 - Second LAN

I can ping workers and manager from both LAN and join all to the same cluster. Orchestration works too (tasks populate). Usually, I use POSTROUTING iptables to replace source ip of the packet by the Ip of the interface (avoid to configure static ip route on all servers). But when I join the worker, the node ip is not the swarm server but the Nat IP after the VPN on the manager cluster. So I think NAT cause problems here. I tried --advertise-addr to enforce the ip but didnt work as well.

Do you have any way to work with NAT or we have to do static routes only ?

Thanks

mamirpanah commented 2 years ago

I have experienced the same issue with upgrading docker from 19 to 20, our network got conflicts with service name resolving and our containers on the same network could not talk to each other. And the problem was endpoint_mode and not using default ingress network. Here are few changes that worked in our compose file: 1-Changing ports publishing to long syntax to use host mode ports: target: publish: protocol: mode: host 2-Using: deploy: endpoint_mode: dnsrr 3-Using hostname identical with service name

zachsa commented 4 months ago

On Docker Engine v26, I'm finding that from within a container, ping <servicename> fails even though nslookup tasks.servicename succeeds (and running ping <ip address from tasks.servicename> succeeds).

What could cause this? (I should mention I've setup docker in LXC containers)

In summary, from within a service container defined on the same stack as the service nginx:

nslookup nginx          # gives the virtual IP of the nginx service
nslookup tasks.nginx    # gives the correct IP of an nginx container (10.0.17.20)
ping 10.0.17.20         # this works
ping nginx              # doesn't work
curl http://10.0.17.20  # this works
curl http://nginx       # doesn't work

However, curl http://tasks.nginx does work.

zbalogh commented 2 months ago

Disabling checksum offloading appears to have resolved this issue on my swarm cluster.

See details here:

https://portal.portainer.io/knowledge/known-issues-with-vmware

https://forums.rockylinux.org/t/docker-swarm-cluster-network-issue/5335

Regards, Zoltan