machine-drivers / docker-machine-driver-xhyve

docker-machine/minikube/minishift driver plugin for xhyve/hyperkit (native macOS hypervisor.framework)
https://godoc.org/github.com/machine-drivers/docker-machine-driver-xhyve
BSD 3-Clause "New" or "Revised" License
888 stars 74 forks source link

docker 1.12 swarm mode ingress load balancing partially working #132

Open matt-deboer opened 7 years ago

matt-deboer commented 7 years ago

First of all, this seems to be mostly working for the basic docker use cases (although I confess I haven't tried everything).

After creating a swarm (using docker 1.12 docker swarm init and docker swarm join... commands) of 3 machines (1 manager, 2 workers, all using the xhyve driver), published ports appear to only be accessible from the nodes on which container instances are running.

This is compared to an identical swarm cluster created with the virtualbox driver, where the service is accessible on the published port of all 3 machines.

Steps to reproduce:

# create the swarm cluster
docker-machine create -d xhyve x-master-1
eval $(docker-machine env x-master-1)
docker swarm init --advertise-addr $(docker-machine ip x-master-1)
export manager_token=$(docker swarm join-token -q manager)

docker-machine create -d xhyve x-node-1
eval $(docker-machine env x-node-1)
docker swarm join --token ${manager_token} $(docker-machine ip x-master-1)

docker-machine create -d xhyve x-node-2
eval $(docker-machine env x-node-2)
docker swarm join --token ${manager_token} $(docker-machine ip x-master-1)

# deploy service
eval $(docker-machine env x-master-1)
docker service create --name hello -p 8080:80 nginx

# ... wait for docker service ls to show all replicas available
# test that service is ingress-balanced from all machines
curl -s "http://$(docker-machine ip x-master-1):8080/" | grep "Welcome" || echo "not accessible"
curl -s "http://$(docker-machine ip x-node-1):8080/" | grep "Welcome" || echo "not accessible"
curl -s "http://$(docker-machine ip x-node-2):8080/" | grep "Welcome" || echo "not accessible"

Further exploration (netstat -tnlp on each node) shows that there's no listener for 8080 on the machines where instance containers for that service are not running.

netstat -tnlp (xhyve nodes where instance containers running, and ALL virtualbox nodes)

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 :::8080                 :::*                    LISTEN      -
tcp        0      0 :::22                   :::*                    LISTEN      -
tcp        0      0 :::2376                 :::*                    LISTEN      -
tcp        0      0 :::7946                 :::*                    LISTEN      -

netstat -tnlp (xhyve nodes where instance containers NOT running)

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 :::22                   :::*                    LISTEN      -
tcp        0      0 :::2376                 :::*                    LISTEN      -
tcp        0      0 :::7946                 :::*                    LISTEN      -

I also noticed that running docker service ps hello shows that the nodes are all named 'boot2docker' in the xhyve case, but in the virtualbox case, they are named correctly.

docker service ps hello (virtualbox)

ID                         NAME     IMAGE  NODE        DESIRED STATE  CURRENT STATE           ERROR
0vad3i6m78pwlpmy036w6cibh  hello.1  nginx  v-master-1  Running        Running 38 minutes ago
el0idy3uz4s8djwnlxe04iyhq  hello.2  nginx  v-node-2    Running        Running 38 minutes ago

docker service ps hello (xhyve)

ID                         NAME     IMAGE  NODE         DESIRED STATE  CURRENT STATE           ERROR
504oows46cwv92n54cmml6955  hello.1  nginx  boot2docker  Running        Running 32 minutes ago
cn3ymzygp4fxst7oyray38o7a  hello.2  nginx  boot2docker  Running        Running 32 minutes ago
zchee commented 7 years ago

@matt-deboer Thanks for issue :) Okay, I will debug it.

Just in case(sorry, I'm not good at English), this problem means xhyve does not run :8080 only? or have any other problem?

matt-deboer commented 7 years ago

Let me try to simplify the issue; maybe you're familiar with the new docker swarm mode networking (I'm just barely familiar with it myself)--which uses something called "ingress load-balancing", which allows a service to be accessed from a published port (-p host:container syntax) on any of the machines in the cluster.

This works correctly on the virtualbox provider, but on the xhyve provider, the service is only accessible on a hosts where an instance of that service is running. The example host port I chose for the 'hello' service was 8080, mapped to 80 of the nginx container.

I compared the startup logs /var/log/docker.log on one of the xhyve machines against one of the virtualbox machines, and I noticed the following line, which I believe is the root of the problem:

time="2016-08-17T15:59:52.588760873Z" level=warning msg="2016/08/17 15:59:52 [ERR] memberlist: Conflicting address for boot2docker. Mine: 192.168.64.128:7946 Theirs: 192.168.64.16:7946\n"

It seems that the machines all having the same hostname (boot2docker) is not allowing them to be distinguished from each other in the swarm overlay network.

I tested this by manually setting the hostname on each of the nodes before running the swarm init/join commands, and this fixes the issue. So it looks like maybe just a change to update /etc/hostname on create would do the trick...

matt-deboer commented 7 years ago

Also, I created a script to quickly test here: https://gist.github.com/matt-deboer/3b81462f795166d736d91ca5be0a4e65

zchee commented 7 years ago

@matt-deboer Thanks for details :) I will try to debug it.

But I'm still not trying to docker swarm... I'm should learn it. this script seems to helpful for learn to swarm or etc. Thanks. Please wait a moment.