Open lipingxue opened 7 years ago
Same issue than #35230 My VMs running on Vmware ESX too perhaps there is an incidence
@GordonTheTurtle @fntlnz Could you or someone who familiar with this issue to take a look? Is it known issue? I have tried newest docker ce release 17.10, but can still reproduce this issue.
@fcrisciani Could you help to take a look since you commented in a similar issue #32195 , Thanks!
I tried the same test with docker 17.10.ce and can still reproduce this issue. @fcrisciani Could you help to take a look?
@lipingxue right now the statement is very vague. 0) put the daemon in debug mode on vm3 and check if there is any clear error message 1) check that the node is properly joined to the swarm (check that is mark as healthy and receives tasks) 2) check that in the data path between the nodes the ports and the MTU are properly configured and you have full L3 connectivity. 3) try to run a service with 3 replicas on the same network, you should have 1 task per node, enter there and issue a ping towards the other containers. If that fails, you can tcpdump the vxlan packets and make sure that the destination IP is correct and reachable. In cases where you have nodes in different subnets you need L3 connectivity between the nodes and in some cases you have to specify the advertise address explicitly to be sure that the ip exposed is the correct one.
i have the same issue ... with a subnet , node loose communication in overlay networks. I test with 3 nodes 2 in the same subnet and the other in different. routing mesh works well when you use the 2 of the same network, the third one doesnt'work. the ping neither because the reply was lost. (the ping arrive from manager to the worker outside, then the worker reply , but the response never arrive to the sender.
Docker version 18.03.0-ce, build 0520e24
[root@localhost ~]# docker service ls | column -t | grep 808 0ign5fe41zi5 my-web1 replicated 3/3 nginx:latest :8081->80/tcp kapyf63pzf1m my-web2 replicated 3/3 nginx:latest :8080->80/tcp 5uk9a0s69ua6 my-web3 replicated 3/3 nginx:latest :8082->80/tcp [root@localhost ~]# for node in 192.168.122.181 192.168.122.79 192.168.122.222; do echo " ------ $node ------ " ssh $node "ss -tunlp" | column -t | grep 808 done ------ 192.168.122.181 ------ tcp LISTEN 0 128 :::8081 ::: users:(("dockerd",pid=881,fd=31)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=881,fd=57)) ------ 192.168.122.79 ------ tcp LISTEN 1 128 :::8080 ::: users:(("dockerd",pid=878,fd=58)) tcp LISTEN 1 128 :::8081 ::: users:(("dockerd",pid=878,fd=49)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=878,fd=68)) ------ 192.168.122.222 ------ tcp LISTEN 0 128 :::8080 ::: users:(("dockerd",pid=884,fd=47)) tcp LISTEN 0 128 :::8082 ::: users:(("dockerd",pid=884,fd=53)) [root@localhost ~]# docker version | grep -i version Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4
VHOST: CentOS Linux release 7.4.1708 (Core)
virsh version Compiled against library: libvirt 2.5.0 Using library: libvirt 2.5.0 Using API: QEMU 2.5.0 Running hypervisor: QEMU 2.8.0
Just adding to the above subject. My group is encountering the same issue on AWS for a swarm cluster with nodes split amongst 2 AZs:
If we removed one AZ from the cluster, communication to the services work as expected. Our current workaround is to use one subnet.
Docker-ce v18.03 Security Group has rule opened for TCP/UDP with port range 1024-65535 inbound and All Traffic allowed for egress. Same rule also exists in NACL controlling these subnets.
@valenbb can you describe better the AWS configuration that you are using? I would suggest to check if you actually have connectivity across AZs. In theory with overlay also having 2 subnets would not be a problem as long as there is connectivity across.
Want to add that I'm experiencing this as well. Running a swarm consisting of 5 nodes across 2 subnets:
The Raspberry Pi 3B+ nodes and Beaglebone Black node are all on the same subnet and all is working fine. The Pi Zero nodes are each on their own subnet.
Since I don't have an ethernet adapter for the Pi Zeros, they are each plugged into one of the Pi 3B+ nodes and configured as a USB Ethernet Gadgets.
I have full internet on all nodes and nslookup run from any container can see all of the others on the same network. No containers on the Pi 3B+ nodes or Beaglebone node can connect to the Pi Zero containers otherwise. The Pi Zero containers are unable to ping any of the other containers as well.
Is there any way to check if the Docker daemon is routing packets properly?
Same issue here ! Running on docker 18.09. Do you have any news ?
I have the same using docker 19.03. Really looking for a solution
@belfo if it can help, for me it finally wasn't related to subnet, see this answer : https://serverfault.com/a/986275
i don't have nat. it's all internal network. just different subnet as located on differents sites.
I had the same issue and it turned out to be that something (I'm guessing an external network component maybe NIC or switch or router) was filtering out certain VXLAN traffic on 4789.
You can confirm by running a background tcpdump
for udp port 4789
on the host where you run the service (eg. curl
) that never hears back from the swarm service on the host in the different subnet (eg. nginx
). You will see that tcpdump
captures some traffic when you curl
the nginx
service in the same subnet, but captures nothing when you curl
the nginx
service in the different subnet. If you are using a single nginx
service with replicas that you know are distributed to hosts in the different subnets, just curl
multiple times.
My guess is that whatever is filtering the traffic does so if the traffic doesn't originate from the destined subnet? Just a guess though.
To workaround it, create a new swarm and specify a different port for VXLAN traffic (I picked 4790 and it finally worked): swarm init --data-path-addr 4790
. Use netcat
to verify that hosts from different subnets can udp
to each other on the new port you pick: nc -ul 4790
and nc -u <host> 4790
.
Description
Steps to reproduce the issue:
docker service ls ID NAME MODE REPLICAS IMAGE PORTS ixjuubqpovud nginx_service replicated 1/1 nginx:latest *:8080->80/tcp
root@sc-rdops-vm18-dhcp-57-89:~# curl 127.0.0.1:8080 <!DOCTYPE html>
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and working. Further configuration is required.
For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.
Thank you for using nginx.
root@sc-rdops-vm18-dhcp-57-89:~# curl 127.0.0.1:8080 curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection timed out
root@sc-rdops-vm18-dhcp-57-89:~# docker version Client: Version: 17.09.0-ce API version: 1.32 Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:42:18 2017 OS/Arch: linux/amd64
Server: Version: 17.09.0-ce API version: 1.32 (minimum version 1.12) Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:40:56 2017 OS/Arch: linux/amd64 Experimental: false
root@sc-rdops-vm18-dhcp-57-89:~# docker info Containers: 2 Running: 1 Paused: 0 Stopped: 1 Images: 2 Server Version: 17.09.0-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: active NodeID: r3u8r9wwbog8otva9lmijgd7y Is Manager: true ClusterID: bidsno23gl3b3dbsh740822xg Managers: 1 Nodes: 3 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 3 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 10.161.42.200 Manager Addresses: 10.161.42.200:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0 runc version: 3f2f8b84a77f73d38244dd690525642a72156c64 init version: 949e6fa Security Options: apparmor seccomp Profile: default Kernel Version: 4.4.0-42-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 992.5MiB Name: sc-rdops-vm18-dhcp-57-89 ID: 4AZ6:SPTC:QECB:MWW6:TAJG:TEUN:EI3O:U6PM:FQCB:2HKM:UVKB:C7LF Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support