moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.72k stars 18.67k forks source link

Routing mesh does not work in swarm mode if node in a swarm cluster is in different subnet #35249

Open lipingxue opened 7 years ago

lipingxue commented 7 years ago

Description

Steps to reproduce the issue:

  1. create a three node swarm cluster with three VMs, (VM1, VM2 and VM3). VM1 and VM2 are in same subnet, but VM3 is in different subnet. VM1 IP: 10.161.42.200 VM2 IP: 10.161.60.160 VM3 IP: 10.192.176.97
  2. create a nginx service on swarm manager (VM1)
    
    docker service create --name nginx_service --publish 8080:80 nginx

docker service ls ID NAME MODE REPLICAS IMAGE PORTS ixjuubqpovud nginx_service replicated 1/1 nginx:latest *:8080->80/tcp

3. Run "curl 127.0.0.1:8080" to access the service from VM1, VM2, and VM3

4. I have opened all the required port on those three VMs
2376/tcp 
2377/tcp 
7946/tcp
7946/udp
4789/tcp
4789/udp

**Describe the results you received:**
On VM1 and VM2, can access the nginx service correctly.

VM1:

root@sc-rdops-vm18-dhcp-57-89:~# curl 127.0.0.1:8080 <!DOCTYPE html>

Welcome to nginx!

Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.


On VM3, cannot access the nginx service, curl command timed out

root@sc-rdops-vm18-dhcp-57-89:~# curl 127.0.0.1:8080 curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection timed out


**Describe the results you expected:**
On VM3, should be able to access nginx service

**Additional information you deem important (e.g. issue happens only occasionally):**
We don't see this issue if all VMs in the swarm cluster are in the same subnet.

**Output of `docker version`:**

root@sc-rdops-vm18-dhcp-57-89:~# docker version Client: Version: 17.09.0-ce API version: 1.32 Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:42:18 2017 OS/Arch: linux/amd64

Server: Version: 17.09.0-ce API version: 1.32 (minimum version 1.12) Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:40:56 2017 OS/Arch: linux/amd64 Experimental: false


**Output of `docker info`:**

root@sc-rdops-vm18-dhcp-57-89:~# docker info Containers: 2 Running: 1 Paused: 0 Stopped: 1 Images: 2 Server Version: 17.09.0-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: active NodeID: r3u8r9wwbog8otva9lmijgd7y Is Manager: true ClusterID: bidsno23gl3b3dbsh740822xg Managers: 1 Nodes: 3 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 3 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 10.161.42.200 Manager Addresses: 10.161.42.200:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0 runc version: 3f2f8b84a77f73d38244dd690525642a72156c64 init version: 949e6fa Security Options: apparmor seccomp Profile: default Kernel Version: 4.4.0-42-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 992.5MiB Name: sc-rdops-vm18-dhcp-57-89 ID: 4AZ6:SPTC:QECB:MWW6:TAJG:TEUN:EI3O:U6PM:FQCB:2HKM:UVKB:C7LF Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No swap limit support



**Additional environment details (AWS, VirtualBox, physical, etc.):**
VMs running on Vmware ESX
sylvainmouquet commented 7 years ago

Same issue than #35230 My VMs running on Vmware ESX too perhaps there is an incidence

lipingxue commented 7 years ago

@GordonTheTurtle @fntlnz Could you or someone who familiar with this issue to take a look? Is it known issue? I have tried newest docker ce release 17.10, but can still reproduce this issue.

lipingxue commented 7 years ago

@fcrisciani Could you help to take a look since you commented in a similar issue #32195 , Thanks!

lipingxue commented 7 years ago

I tried the same test with docker 17.10.ce and can still reproduce this issue. @fcrisciani Could you help to take a look?

fcrisciani commented 7 years ago

@lipingxue right now the statement is very vague. 0) put the daemon in debug mode on vm3 and check if there is any clear error message 1) check that the node is properly joined to the swarm (check that is mark as healthy and receives tasks) 2) check that in the data path between the nodes the ports and the MTU are properly configured and you have full L3 connectivity. 3) try to run a service with 3 replicas on the same network, you should have 1 task per node, enter there and issue a ping towards the other containers. If that fails, you can tcpdump the vxlan packets and make sure that the destination IP is correct and reachable. In cases where you have nodes in different subnets you need L3 connectivity between the nodes and in some cases you have to specify the advertise address explicitly to be sure that the ip exposed is the correct one.

DarthRevan00 commented 6 years ago

i have the same issue ... with a subnet , node loose communication in overlay networks. I test with 3 nodes 2 in the same subnet and the other in different. routing mesh works well when you use the 2 of the same network, the third one doesnt'work. the ping neither because the reply was lost. (the ping arrive from manager to the worker outside, then the worker reply , but the response never arrive to the sender.

DarthRevan00 commented 6 years ago

Docker version 18.03.0-ce, build 0520e24

xulis commented 6 years ago

[root@localhost ~]# docker service ls | column -t | grep 808 0ign5fe41zi5 my-web1 replicated 3/3 nginx:latest :8081->80/tcp kapyf63pzf1m my-web2 replicated 3/3 nginx:latest :8080->80/tcp 5uk9a0s69ua6 my-web3 replicated 3/3 nginx:latest :8082->80/tcp [root@localhost ~]# for node in 192.168.122.181 192.168.122.79 192.168.122.222; do echo " ------ $node ------ " ssh $node "ss -tunlp" | column -t | grep 808 done ------ 192.168.122.181 ------ tcp LISTEN 0 128 :::8081 ::: users:(("dockerd",pid=881,fd=31)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=881,fd=57)) ------ 192.168.122.79 ------ tcp LISTEN 1 128 :::8080 ::: users:(("dockerd",pid=878,fd=58)) tcp LISTEN 1 128 :::8081 ::: users:(("dockerd",pid=878,fd=49)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=878,fd=68)) ------ 192.168.122.222 ------ tcp LISTEN 0 128 :::8080 ::: users:(("dockerd",pid=884,fd=47)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=884,fd=53)) [root@localhost ~]# docker version | grep -i version Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4

VHOST: CentOS Linux release 7.4.1708 (Core)

virsh version Compiled against library: libvirt 2.5.0 Using library: libvirt 2.5.0 Using API: QEMU 2.5.0 Running hypervisor: QEMU 2.8.0

valenbb commented 6 years ago

Just adding to the above subject. My group is encountering the same issue on AWS for a swarm cluster with nodes split amongst 2 AZs:

If we removed one AZ from the cluster, communication to the services work as expected. Our current workaround is to use one subnet.

Docker-ce v18.03 Security Group has rule opened for TCP/UDP with port range 1024-65535 inbound and All Traffic allowed for egress. Same rule also exists in NACL controlling these subnets.

fcrisciani commented 6 years ago

@valenbb can you describe better the AWS configuration that you are using? I would suggest to check if you actually have connectivity across AZs. In theory with overlay also having 2 subnets would not be a problem as long as there is connectivity across.

ndanyluk commented 5 years ago

Want to add that I'm experiencing this as well. Running a swarm consisting of 5 nodes across 2 subnets:

The Raspberry Pi 3B+ nodes and Beaglebone Black node are all on the same subnet and all is working fine. The Pi Zero nodes are each on their own subnet.

Since I don't have an ethernet adapter for the Pi Zeros, they are each plugged into one of the Pi 3B+ nodes and configured as a USB Ethernet Gadgets.

I have full internet on all nodes and nslookup run from any container can see all of the others on the same network. No containers on the Pi 3B+ nodes or Beaglebone node can connect to the Pi Zero containers otherwise. The Pi Zero containers are unable to ping any of the other containers as well.

Is there any way to check if the Docker daemon is routing packets properly?

abellion commented 5 years ago

Same issue here ! Running on docker 18.09. Do you have any news ?

belfo commented 5 years ago

I have the same using docker 19.03. Really looking for a solution

abellion commented 5 years ago

@belfo if it can help, for me it finally wasn't related to subnet, see this answer : https://serverfault.com/a/986275

belfo commented 5 years ago

i don't have nat. it's all internal network. just different subnet as located on differents sites.

lddias commented 4 years ago

I had the same issue and it turned out to be that something (I'm guessing an external network component maybe NIC or switch or router) was filtering out certain VXLAN traffic on 4789.

You can confirm by running a background tcpdump for udp port 4789 on the host where you run the service (eg. curl) that never hears back from the swarm service on the host in the different subnet (eg. nginx). You will see that tcpdump captures some traffic when you curl the nginx service in the same subnet, but captures nothing when you curl the nginx service in the different subnet. If you are using a single nginx service with replicas that you know are distributed to hosts in the different subnets, just curl multiple times.

My guess is that whatever is filtering the traffic does so if the traffic doesn't originate from the destined subnet? Just a guess though.

To workaround it, create a new swarm and specify a different port for VXLAN traffic (I picked 4790 and it finally worked): swarm init --data-path-addr 4790. Use netcat to verify that hosts from different subnets can udp to each other on the new port you pick: nc -ul 4790 and nc -u <host> 4790.