Open ghostplant opened 7 years ago
The ingress network is a special network, and not meant for services to communicate with each other. Services cannot communicate with each other on the "container-container" network, unless they are part of the same custom network. This allows you to separate services, and prevent them from communicating with each other.
The ingress network is only used to route incoming network traffic from published ports between nodes (the "routing mesh")
After the new test, "container-container" network didn't work as well between a public swarm manager node and a private swarm worker node if I create a shared overlay network for them instead of ingress.
Assume a swarm worker in private subnet joined into a swarm manager in public subnet. Then I did the following on swarm manager node:
1) docker network create --driver overlay --subnet 10.10.0.0/16 test 2) docker service create --network test --replicas 2 alpine:3.4 sleep 1h
Then one container is booted on swarm manager node and another one is booted on swarm worker node, and they are not ping-able over their overlay interface.
Is it just ping
not working, or are you not able to connect at all? Are you trying to ping
individual containers, or a service (the VIP / Virtual IP)? Pinging the VIP cross-node may not work.
What version of docker are you running, and what platform are you on? (docker version
, docker info
); overlay networks on older (< 3.16 IIRC) kernels cannot have an IP-range that overlaps with an underlay network.
@thaJeztah
Docker 1.12.6 running on Ubuntu 16.04 (kernel = 4.4), no IP overlap.
Both ping serviceX.num.taskY
and ping container-eth-peer-IP
don't work if one container is in private network. And only if two containers are running on a shared subnet, both of above 2 approaches work.
An easy way to reproduce the issue:
1) Prepare a physical machine with Ubuntu 16.04 installed;
2) Setup docker.io via apt-get
, and run 'docker swarm init', and save the join-token (as worker);
3) Setup qemu-kvm via apt-get
, and create a NAT-mode VM with Ubuntu 16.04 installed;
4) Inside VM, install docker.io and run 'docker swarm join ... ' to join into the swarm host managed by the outer OS;
5) From outer OS, run 'docker service create --network my_overlay --replicas 4 alpine sleep 1d'
6) Both outer OS and VM should start booting part of replicas of this service. Using 'docker exec -it ping <other-containers>
.
7) The result is a remote container is not pingable if one container is from outer OS, and the other container is from VM.
Here, VM simulates a private network, and outer OS simulates a public network.
Apart from ping (over icmp), I tried tcp and udp using netcat
as well bewteen 2 containers, and they cannot communicate with each other as well. Besides, if I create a web service publishing port 80, it's also unable to forward the network service if the web service is running on worker node in a private network.
@ghostplant The case of workers behind private-addr trying to have a seamless container connectivity with other workers or managers behind a NAT'ed public-addr is an interesting case and has a few subtleties to it. To answer your question, not all behind-the-NAT public-addr connectivity is supported and it is as per design.
Container connectivity using overlay networking depends on VXLAN tunnel (for container data-plane) and Gossip control-plane (for routing exchange). These 2 mechanisms work in a totally distributed way between the workers and doesn't rely on managers or any central management layer handle the control and data-plane (unlike the task scheduling that entirely rely on the swarm manager). Also unlike the initial swarm join (where the worker joins the manager), the distributed network control plane connects between them via UDP and is p2p (2 way connectivity). Hence the ports (4789 and 7946) of these nodes must be exposed for other nodes to connect to. NAT like mechanisms, take control over such port publishing and it requires special routing configurations to poke a hole on a particular exposed port all the way from Public address to the private address. Fortunately, in cases like AWS, they support 1:1 public->private addresses and hence fixes like : https://github.com/docker/libnetwork/pull/1337 will work with proper --advertise-addr
and --listen-addr
to enable such connectivity across NAT. PTAL : https://github.com/swarmzilla/swarm3k/blob/master/NODE_PREPARATION.md for 1 such deployment used during the swarm3k project.
Once you understand this design, you can configure your routers to have these ports opened up and then configure your swarm init
and swarm join
using appropriate --advertise-addr
and --listen-addr
to make it work behind a NAT.
@mavenugo Thanks, I think this is a way to get some situation supported through it is not an approach of general purpose. I understand it which requires the forwarding access of the router node that works on this public IP. However, if I have the access to this public IP, it also means I can fully control the forwarding and mapping rules between the public address and multi private addresses, which is not a complex case.
The real case needed to solve is for a network service provider who only gives a private IP behind NAT to get access to WAN and you never have the access to changing rules of public IP. This case is far more common, including your devices connecting a mobility network, your laptop from schools/family and so on, and the method your provide cannot solve this kind of common case.
@ghostplant There are definitely very many scenarios that needs fine-tuning. But Docker network provides Plugin APIs using which one can implement plugins to solve such special purpose forwarding scenarios.
I saw weaveworks can support my case of NAT-behide containers as swarm worker which is not portable nor thin. Hope to have a better support by built-in swarm engine.
I have the same problem. I spent a week trying to create an overlay network that would work (because if you expose the ssh port for the ingerss network, you can connect to the container on a private subnet). I wish there are more information about this limitation in the documentation or hope to have supporting overlay network in different subnets.
@ghostplant did you find workaround?
@BulatSaif currently using a VPN/weave around docker C/S seems work if you don't care about performance. Another choice is to realize the user-space tunnel in application layer.
I have a question which is already claimed in the title of this issue.
Assume we have a swarm manager deployed in a public network, and several swarm workers deployed in a private network (the private network is within a router, so all services in this private network uses NAT to communicate with public services). Apparently, a service in private network can play a role as client to connect to another service in public network but this cannot be feasible conversely.
Considering the situation above, I tested a swarm worker in private network connecting a swarm manager in public network, and I saw the operation can be successfully done, and it seems to work well even if I create a swarm service with many replications, replicated services can be harmonious started in both swarm worker and swarm manager. For example, service
test.1.xxxxxx
running on swarm manager and servicetest.2.xxxxxx
running on swarm worker.The only problem is,
test.1.xxxxxx
andtest.2.xxxxxx
cannot communicate with each other overingress
network (a default overlay network tunnel used by inter-machine service comminution), so swarm network features likevip
doesn't work as well.My question is whether this communication problem is just due to a bug, or not a feature as designed. I think this requirement is very useful, since it can organize multi private network infrastructures. Current support of private swarm worker connecting a public swarm manager is already on the half way to realize it.