moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.63k stars 18.64k forks source link

About overlay network: swarm worker in private subnet connecting swarm manager in public subnet #31677

Open ghostplant opened 7 years ago

ghostplant commented 7 years ago

I have a question which is already claimed in the title of this issue.

Assume we have a swarm manager deployed in a public network, and several swarm workers deployed in a private network (the private network is within a router, so all services in this private network uses NAT to communicate with public services). Apparently, a service in private network can play a role as client to connect to another service in public network but this cannot be feasible conversely.

Considering the situation above, I tested a swarm worker in private network connecting a swarm manager in public network, and I saw the operation can be successfully done, and it seems to work well even if I create a swarm service with many replications, replicated services can be harmonious started in both swarm worker and swarm manager. For example, service test.1.xxxxxx running on swarm manager and service test.2.xxxxxx running on swarm worker.

The only problem is, test.1.xxxxxx and test.2.xxxxxx cannot communicate with each other over ingress network (a default overlay network tunnel used by inter-machine service comminution), so swarm network features like vip doesn't work as well.

My question is whether this communication problem is just due to a bug, or not a feature as designed. I think this requirement is very useful, since it can organize multi private network infrastructures. Current support of private swarm worker connecting a public swarm manager is already on the half way to realize it.

thaJeztah commented 7 years ago

The ingress network is a special network, and not meant for services to communicate with each other. Services cannot communicate with each other on the "container-container" network, unless they are part of the same custom network. This allows you to separate services, and prevent them from communicating with each other.

The ingress network is only used to route incoming network traffic from published ports between nodes (the "routing mesh")

ghostplant commented 7 years ago

After the new test, "container-container" network didn't work as well between a public swarm manager node and a private swarm worker node if I create a shared overlay network for them instead of ingress.

Assume a swarm worker in private subnet joined into a swarm manager in public subnet. Then I did the following on swarm manager node:

1) docker network create --driver overlay --subnet 10.10.0.0/16 test 2) docker service create --network test --replicas 2 alpine:3.4 sleep 1h

Then one container is booted on swarm manager node and another one is booted on swarm worker node, and they are not ping-able over their overlay interface.

thaJeztah commented 7 years ago

Is it just ping not working, or are you not able to connect at all? Are you trying to ping individual containers, or a service (the VIP / Virtual IP)? Pinging the VIP cross-node may not work.

What version of docker are you running, and what platform are you on? (docker version, docker info); overlay networks on older (< 3.16 IIRC) kernels cannot have an IP-range that overlaps with an underlay network.

ghostplant commented 7 years ago

@thaJeztah Docker 1.12.6 running on Ubuntu 16.04 (kernel = 4.4), no IP overlap. Both ping serviceX.num.taskY and ping container-eth-peer-IP don't work if one container is in private network. And only if two containers are running on a shared subnet, both of above 2 approaches work.

ghostplant commented 7 years ago

An easy way to reproduce the issue:

1) Prepare a physical machine with Ubuntu 16.04 installed; 2) Setup docker.io via apt-get, and run 'docker swarm init', and save the join-token (as worker); 3) Setup qemu-kvm via apt-get, and create a NAT-mode VM with Ubuntu 16.04 installed; 4) Inside VM, install docker.io and run 'docker swarm join ... ' to join into the swarm host managed by the outer OS; 5) From outer OS, run 'docker service create --network my_overlay --replicas 4 alpine sleep 1d' 6) Both outer OS and VM should start booting part of replicas of this service. Using 'docker exec -it sh' to log into one of containers, and test ping <other-containers>. 7) The result is a remote container is not pingable if one container is from outer OS, and the other container is from VM.

Here, VM simulates a private network, and outer OS simulates a public network.

ghostplant commented 7 years ago

Apart from ping (over icmp), I tried tcp and udp using netcat as well bewteen 2 containers, and they cannot communicate with each other as well. Besides, if I create a web service publishing port 80, it's also unable to forward the network service if the web service is running on worker node in a private network.

mavenugo commented 7 years ago

@ghostplant The case of workers behind private-addr trying to have a seamless container connectivity with other workers or managers behind a NAT'ed public-addr is an interesting case and has a few subtleties to it. To answer your question, not all behind-the-NAT public-addr connectivity is supported and it is as per design.

Container connectivity using overlay networking depends on VXLAN tunnel (for container data-plane) and Gossip control-plane (for routing exchange). These 2 mechanisms work in a totally distributed way between the workers and doesn't rely on managers or any central management layer handle the control and data-plane (unlike the task scheduling that entirely rely on the swarm manager). Also unlike the initial swarm join (where the worker joins the manager), the distributed network control plane connects between them via UDP and is p2p (2 way connectivity). Hence the ports (4789 and 7946) of these nodes must be exposed for other nodes to connect to. NAT like mechanisms, take control over such port publishing and it requires special routing configurations to poke a hole on a particular exposed port all the way from Public address to the private address. Fortunately, in cases like AWS, they support 1:1 public->private addresses and hence fixes like : https://github.com/docker/libnetwork/pull/1337 will work with proper --advertise-addr and --listen-addr to enable such connectivity across NAT. PTAL : https://github.com/swarmzilla/swarm3k/blob/master/NODE_PREPARATION.md for 1 such deployment used during the swarm3k project.

Once you understand this design, you can configure your routers to have these ports opened up and then configure your swarm init and swarm join using appropriate --advertise-addr and --listen-addr to make it work behind a NAT.

ghostplant commented 7 years ago

@mavenugo Thanks, I think this is a way to get some situation supported through it is not an approach of general purpose. I understand it which requires the forwarding access of the router node that works on this public IP. However, if I have the access to this public IP, it also means I can fully control the forwarding and mapping rules between the public address and multi private addresses, which is not a complex case.

The real case needed to solve is for a network service provider who only gives a private IP behind NAT to get access to WAN and you never have the access to changing rules of public IP. This case is far more common, including your devices connecting a mobility network, your laptop from schools/family and so on, and the method your provide cannot solve this kind of common case.

mavenugo commented 7 years ago

@ghostplant There are definitely very many scenarios that needs fine-tuning. But Docker network provides Plugin APIs using which one can implement plugins to solve such special purpose forwarding scenarios.

ghostplant commented 7 years ago

I saw weaveworks can support my case of NAT-behide containers as swarm worker which is not portable nor thin. Hope to have a better support by built-in swarm engine.

BulatSaif commented 6 years ago

I have the same problem. I spent a week trying to create an overlay network that would work (because if you expose the ssh port for the ingerss network, you can connect to the container on a private subnet). I wish there are more information about this limitation in the documentation or hope to have supporting overlay network in different subnets.

BulatSaif commented 6 years ago

@ghostplant did you find workaround?

ghostplant commented 6 years ago

@BulatSaif currently using a VPN/weave around docker C/S seems work if you don't care about performance. Another choice is to realize the user-space tunnel in application layer.