weave-loadbalance guide only curls single container

bdentino commented 8 years ago

I'm having issues getting the weave-loadbalance guide to work. Following the latest version of the guides, I believe I've set everything up appropriately, but when I curl the hostname from the ubuntu container, all requests resolve to the same ip. Is this normal?

root@ubuntu:/# for i in `seq 1 20`; do curl loadbalance.weave.local/myip; done
10.2.128.2
fe80::702b:fbff:fea4:a69e

10.2.128.2
fe80::702b:fbff:fea4:a69e

10.2.128.2
fe80::702b:fbff:fea4:a69e

10.2.128.2
fe80::702b:fbff:fea4:a69e

<same all the way to 20>

It seems that all the containers are visible to weave:

vagrant@weave-gs-01:~$ weave status dns
loadbalance  10.2.0.2        0bc0b412839b 12:bd:02:20:2c:6f
loadbalance  10.2.0.3        389e89595b15 12:bd:02:20:2c:6f
loadbalance  10.2.0.1        520b20b0aaad 12:bd:02:20:2c:6f
loadbalance  10.2.128.0      04213dd355b9 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.1      1f8e13d2ba13 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.2      ef29f01ffc96 ae:8b:3e:b1:58:f1
ubuntu       10.2.128.3      2c4c6d0dce85 ae:8b:3e:b1:58:f1

dig within the ubuntu container sees all hosts, and I can ping each ip address individually:

root@ubuntu:/# dig loadbalance.weave.local

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> loadbalance.weave.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28648
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;loadbalance.weave.local.   IN  A

;; ANSWER SECTION:
loadbalance.weave.local. 1  IN  A   10.2.128.0
loadbalance.weave.local. 1  IN  A   10.2.128.2
loadbalance.weave.local. 1  IN  A   10.2.128.1
loadbalance.weave.local. 1  IN  A   10.2.0.1
loadbalance.weave.local. 1  IN  A   10.2.0.3
loadbalance.weave.local. 1  IN  A   10.2.0.2

;; Query time: 12 msec
;; SERVER: 172.18.0.1#53(172.18.0.1)
;; WHEN: Thu Nov 19 19:32:29 UTC 2015
;; MSG SIZE  rcvd: 275

Some relevant status info on each host:

vagrant@weave-gs-01:~$ weave status

       Version: 1.3.0

       Service: router
      Protocol: weave 1..2
          Name: ae:8b:3e:b1:58:f1(weave-gs-01)
    Encryption: disabled
 PeerDiscovery: enabled
       Targets: 0
   Connections: 1 (1 established)
         Peers: 2 (with 2 established connections)

       Service: ipam
     Consensus: achieved
         Range: 10.2.0.0-10.2.255.255
 DefaultSubnet: 10.2.0.0/16

       Service: dns
        Domain: weave.local.
           TTL: 1
       Entries: 7

       Service: proxy
       Address: unix:///var/run/weave/weave.sock

vagrant@weave-gs-01:~$ weave status dns
loadbalance  10.2.0.2        0bc0b412839b 12:bd:02:20:2c:6f
loadbalance  10.2.0.3        389e89595b15 12:bd:02:20:2c:6f
loadbalance  10.2.0.1        520b20b0aaad 12:bd:02:20:2c:6f
loadbalance  10.2.128.0      04213dd355b9 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.1      1f8e13d2ba13 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.2      ef29f01ffc96 ae:8b:3e:b1:58:f1
ubuntu       10.2.128.3      2c4c6d0dce85 ae:8b:3e:b1:58:f1

vagrant@weave-gs-01:~$ weave ps
weave:expose ae:8b:3e:b1:58:f1
2c4c6d0dce85 c2:ba:2b:35:6e:bd 10.2.128.3/16
ef29f01ffc96 72:2b:fb:a4:a6:9e 10.2.128.2/16
1f8e13d2ba13 ba:ef:64:d7:d6:bd 10.2.128.1/16
04213dd355b9 9e:d8:06:b5:a3:d8 10.2.128.0/16

vagrant@weave-gs-01:~$ docker ps
CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES
2c4c6d0dce85        weaveworks/weave-gs-ubuntu-curl   "/w/w /bin/bash"         8 minutes ago       Up 8 minutes                            small_banach
ef29f01ffc96        weaveworks/myip-scratch           "/w/w /myip"             13 minutes ago      Up 13 minutes                           condescending_aryabhata
1f8e13d2ba13        weaveworks/myip-scratch           "/w/w /myip"             13 minutes ago      Up 13 minutes                           insane_babbage
04213dd355b9        weaveworks/myip-scratch           "/w/w /myip"             13 minutes ago      Up 13 minutes                           elegant_mcclintock
e5e2b4f16f82        weaveworks/weaveexec:1.3.0        "/home/weave/weavepro"   16 minutes ago      Up 16 minutes                           weaveproxy
50c855a60be9        weaveworks/weave:1.3.0            "/home/weave/weaver -"   16 minutes ago      Up 16 minutes                           weave

vagrant@weave-gs-02:~$ weave status

       Version: 1.3.0

       Service: router
      Protocol: weave 1..2
          Name: 12:bd:02:20:2c:6f(weave-gs-02)
    Encryption: disabled
 PeerDiscovery: enabled
       Targets: 1
   Connections: 1 (1 established)
         Peers: 2 (with 2 established connections)

       Service: ipam
     Consensus: achieved
         Range: 10.2.0.0-10.2.255.255
 DefaultSubnet: 10.2.0.0/16

       Service: dns
        Domain: weave.local.
           TTL: 1
       Entries: 7

       Service: proxy
       Address: unix:///var/run/weave/weave.sock

vagrant@weave-gs-02:~$ weave status dns
loadbalance  10.2.0.2        0bc0b412839b 12:bd:02:20:2c:6f
loadbalance  10.2.0.3        389e89595b15 12:bd:02:20:2c:6f
loadbalance  10.2.0.1        520b20b0aaad 12:bd:02:20:2c:6f
loadbalance  10.2.128.0      04213dd355b9 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.1      1f8e13d2ba13 ae:8b:3e:b1:58:f1
loadbalance  10.2.128.2      ef29f01ffc96 ae:8b:3e:b1:58:f1
ubuntu       10.2.128.3      2c4c6d0dce85 ae:8b:3e:b1:58:f1

vagrant@weave-gs-02:~$ weave ps
weave:expose 12:bd:02:20:2c:6f
389e89595b15 1a:20:35:7e:0f:7d 10.2.0.3/16
0bc0b412839b de:08:fc:b1:56:dc 10.2.0.2/16
520b20b0aaad be:38:81:3f:68:32 10.2.0.1/16

vagrant@weave-gs-02:~$ docker ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
389e89595b15        weaveworks/myip-scratch      "/w/w /myip"             15 minutes ago      Up 15 minutes                           silly_jang
0bc0b412839b        weaveworks/myip-scratch      "/w/w /myip"             15 minutes ago      Up 15 minutes                           focused_kirch
520b20b0aaad        weaveworks/myip-scratch      "/w/w /myip"             15 minutes ago      Up 15 minutes                           modest_leavitt
a6f454c86d7d        weaveworks/weaveexec:1.3.0   "/home/weave/weavepro"   17 minutes ago      Up 17 minutes                           weaveproxy
7f323189fdc3        weaveworks/weave:1.3.0       "/home/weave/weaver -"   17 minutes ago      Up 17 minutes                           weave

TBH, I'm not sure if there's actually a problem here. It just seems suspicious that it resolves the same IP every time, so I'm hoping someone can shed some light on why this might be happening or whether I've misconfigured something.

bdentino commented 8 years ago

it looks like https://github.com/weaveworks/weave/issues/1245 explains what's going on here. Which leaves me with a question about using weave for load balancing across containers on the same host. Say I have 3 containers for service1 on the same host. On that same host, I have a service2 container which makes http requests to http://service1.weave.local/data. Based on the info in aforementioned issue, depending on the application (using curl vs netcat vs nodejs http module, etc) there's a good chance that all calls would resolve to the same IP address. In which case it seems pointless to even have extra instances of the container running on the same host. Is that a correct assessment? Is there a recommended way to achieve consistent load balancing across containers on the same host? Maybe @2opremio would have some advice for me?

2opremio commented 8 years ago

@bdentino Thanks for bringing this up! @errordeveloper will take care of this one.

2opremio commented 8 years ago

EDIT: my analysis in this comment is wrong, please read the comments from @awh below for an explanation on why load balancing is biased.

@bdentino I will take the questions from your comment ( @errordeveloper will take care of the guide-related part)

Say I have 3 containers for service1 on the same host. On that same host, I have a service2 container which makes http requests to http://service1.weave.local/data. Based on the info in aforementioned issue, depending on the application (using curl vs netcat vs nodejs http module, etc) there's a good chance that all calls would resolve to the same IP address. In which case it seems pointless to even have extra instances of the container running on the same host. Is that a correct assessment?

I have tested this scenario and your analysis does not seem to be correct. When using getaddrinfo() to resolve the server (which is the most common scenario in Unix systems nowadays), load balancing will span across all the server containers if they all live in the same host and they are different from the client container.

However in the following two cases:

The server containers are spread across different machines and the client container is placed in a host where some of the server containers are running (for instance the scenario in the Weave ECS Guide)
The client container and the server container are the same (this is the example I showcase in weaveworks/weave#1245 )

getaddrinfo() will favor local hosts (server containers in our case). For (1) those will be server containers placed in the same machine as the client. For (2) that will be the client container itself.

In (1) it makes perfect sense to favor local containers, case (2) is questionable though.

I have tested the scenario you mentioned: 3 server containers and a client (different to the servers) living in the same host, and load balancing spans across all server containers.

Note that I am using the ubuntu:latest image which currently includes nc.openbsd which uses getaddrinfo()

I spawn 3 server containers in the same host, associated to hostname foo and make sure that everything works as expected:

vagrant@vagrant-ubuntu-vivid-64:~$ weave launch-router
c626e9130caf897993359481eca32f1283cd4d381846f6d881ad6cbf37e49558
vagrant@vagrant-ubuntu-vivid-64:~$ weave launch-proxy --hostname-match '([^-]+)-[0-9]+'
9188a6ed31f8dd5d69ea8a784815f9b558331c069e1a3988403bfc9260484be5
vagrant@vagrant-ubuntu-vivid-64:~$ eval $(weave env)
vagrant@vagrant-ubuntu-vivid-64:~$ docker run -d --name=foo-1 ubuntu sh -c 'while true; do echo "I am foo-1" | nc -l -p 4567 ; done'
17181a71b42c988ffbcb8408eb29d6b3bc48497ccb6a3817d1856b1ab8dcebc4
vagrant@vagrant-ubuntu-vivid-64:~$ docker run -d --name=foo-2 ubuntu sh -c 'while true; do echo "I am foo-2" | nc -l -p 4567 ; done'
b6af3e56a31e91ad6e451dcc86a8f04417ccc965a43c5a4d5e0035cdc3f7546d
vagrant@vagrant-ubuntu-vivid-64:~$ docker run -d --name=foo-3 ubuntu sh -c 'while true; do echo "I am foo-3" | nc -l -p 4567 ; done'
vagrant@vagrant-ubuntu-vivid-64:~$ weave status dns
foo          10.32.0.1       17181a71b42c a2:ef:27:ed:41:1d
foo          10.32.0.2       b6af3e56a31e a2:ef:27:ed:41:1d
foo          10.32.0.3       cccfc2af1b4b a2:ef:27:ed:41:1d
vagrant@vagrant-ubuntu-vivid-64:~$ weave dns-lookup foo
10.32.0.3
10.32.0.2
10.32.0.1
vagrant@vagrant-ubuntu-vivid-64:~$ docker run --rm -ti ubuntu nc 10.32.0.1 4567
I am foo-1
vagrant@vagrant-ubuntu-vivid-64:~$ docker run --rm -ti ubuntu nc 10.32.0.2 4567
I am foo-2
vagrant@vagrant-ubuntu-vivid-64:~$ docker run --rm -ti ubuntu nc 10.32.0.3 4567
I am foo-3

I launch a client container which connects to the foo hostname several times, reaching all 3 servers:

vagrant@vagrant-ubuntu-vivid-64:~$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-3
I am foo-1
I am foo-2
I am foo-3
I am foo-2
I am foo-1
I am foo-3
I am foo-3
I am foo-1
I am foo-2

Is there a recommended way to achieve consistent load balancing across containers on the same host?

It should just work as I demonstrated above :)

bdentino commented 8 years ago

wow thanks for the detailed response! I think you're correct, and I'm not exactly sure why I was seeing the behavior I did yesterday, but I'm able to reproduce it by taking your example one step further:

Following your steps, everything works great on my host with 3 service containers:

vagrant@weave-gs-01:~$ weave launch-router
feaf0b0b69ba6973a744bf1a8f935c6a5bd5909ee394a25203e2d79bf9d5aa58
vagrant@weave-gs-01:~$ weave launch-proxy --hostname-match '([^-]+)-[0-9]+'
4ae27a3b47e5ea0608139f57d9bcb807bcc3c82406995232bde440804e80d37d
vagrant@weave-gs-01:~$ eval $(weave env)
vagrant@weave-gs-01:~$ docker run -d --name=foo-1 ubuntu sh -c 'while true; do echo "I am foo-1"
> ^C
vagrant@weave-gs-01:~$ docker run -d --name=foo-1 ubuntu sh -c 'while true; do echo "I am foo-1" | nc -l -p 4567 ; done'
15fe62e49fea3246ad6b6894febae845da57ee8baaebbb4c036fcfd73eada907
vagrant@weave-gs-01:~$ docker run -d --name=foo-2 ubuntu sh -c 'while true; do echo "I am foo-2" | nc -l -p 4567 ; done'
08382422e15fd145570fcd66ffa2df52b8a4c848f6c6666d900907f7a5262424
vagrant@weave-gs-01:~$ docker run -d --name=foo-3 ubuntu sh -c 'while true; do echo "I am foo-3" | nc -l -p 4567 ; done'
9a21bb9dc4e5025fd76da7c8ba0e225f426d7c88a347d75ac267b994d2442c33
vagrant@weave-gs-01:~$ weave status dns
foo          10.32.0.2       08382422e15f de:78:13:c0:c1:36
foo          10.32.0.1       15fe62e49fea de:78:13:c0:c1:36
foo          10.32.0.3       9a21bb9dc4e5 de:78:13:c0:c1:36
vagrant@weave-gs-01:~$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-1
I am foo-1
I am foo-3
I am foo-1
I am foo-1
I am foo-1
I am foo-3
I am foo-3
I am foo-3
I am foo-2

However, when I add one more foo service, all client calls are directed to the most recent, and the previous three are ignored.

vagrant@weave-gs-01:~$ docker run -d --name=foo-4 ubuntu sh -c 'while true; do echo "I am foo-4" | nc -l -p 4567 ; done'
c8f24db9be221653f4fc03a4440fda17bf0b3a14b6f3a824af7aeab2f222c732
vagrant@weave-gs-01:~$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
vagrant@weave-gs-01:~$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4

Would you know how to explain this behavior? My understanding of dns and ip routing is admittedly limited.

bdentino commented 8 years ago

i guess i should first ask if you're actually able to reproduce it :stuck_out_tongue_winking_eye:

2opremio commented 8 years ago

@bdentino I can reproduce it (with weave/master for the record). Uhm, let me look into this and get back to you.

In the meantime, @rade / @tomwilkie Do you have any idea of why this is happening? AFAIU getaddrinfo() shouldn't care about the number of records if they are all co-located in the same host.

awh commented 8 years ago

My first attempts to reproduce with four containers:

vagrant@vagrant-ubuntu-vivid-64:~/weave$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-2
I am foo-1
I am foo-3
vagrant@vagrant-ubuntu-vivid-64:~/weave$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-3
I am foo-3
I am foo-3
vagrant@vagrant-ubuntu-vivid-64:~/weave$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-2
I am foo-2
I am foo-2
I am foo-3
I am foo-4
vagrant@vagrant-ubuntu-vivid-64:~/weave$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc foo 4567; done'
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-4
I am foo-2
I am foo-2
I am foo-3

Will continue to investigate.

awh commented 8 years ago

First an explanation of why I am seeing containers other than foo-4; here's the test again with -v:

$ docker run --rm -ti ubuntu bash -c 'for _ in `seq 10`; do nc -v foo 4567; done'
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-4
nc: connect to foo port 4567 (tcp) failed: Connection refused
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-3
nc: connect to foo port 4567 (tcp) failed: Connection refused
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-1
nc: connect to foo port 4567 (tcp) failed: Connection refused
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-3
nc: connect to foo port 4567 (tcp) failed: Connection refused
Connection to foo 4567 port [tcp/*] succeeded!
I am foo-3

In other words, nc is trying to connect to foo-4 each time as with your example runs, but on some occasions the netcat in that container hasn't been respawned by the while loop yet - in this case, netcat skips to the next address in the list. So the question remains - what is special about the fourth container? As a refresher, here are the allocated addresses:

$ weave status dns
foo          10.32.0.1       413d2f135742 5a:63:a2:3f:bf:00
foo          10.32.0.4       9445e536a42d 5a:63:a2:3f:bf:00
foo          10.32.0.2       9c50beb6fded 5a:63:a2:3f:bf:00
foo          10.32.0.3       d31f09fc75ff 5a:63:a2:3f:bf:00

And here is the address allocated to the next container we run, e.g. the one we're using as a client:

$ docker run --rm -ti ubuntu ip -4 addr show ethwe
279: ethwe: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.32.0.5/12 scope global ethwe
       valid_lft forever preferred_lft forever

Now, if we refer to the RFC3484 section on destination address selection we find the following rule:

   Rule 9:  Use longest matching prefix.
   When DA and DB belong to the same address family (both are IPv6 or
   both are IPv4): If CommonPrefixLen(DA, Source(DA)) >
   CommonPrefixLen(DB, Source(DB)), then prefer DA.  Similarly, if
   CommonPrefixLen(DA, Source(DA)) < CommonPrefixLen(DB, Source(DB)),
   then prefer DB.

which states that for each possible destination address (10.32.0.1-10.32.0.4 in our case), determine the source address that would be used to reach it (always 10.32.0.5 in our case), compute the common prefix length between source and destination and prefer longer matches. Here are the addresses in binary:

10.32.0.1 00001010 00100000 00000000 00000001
10.32.0.2 00001010 00100000 00000000 00000010
10.32.0.3 00001010 00100000 00000000 00000011
10.32.0.4 00001010 00100000 00000000 00000100
10.32.0.5 00001010 00100000 00000000 00000101

From this table we can see that 10.32.0.5 and 10.32.0.4 have a common prefix length of 30, whereas the common prefix of 10.32.0.5 and 10.32.0.1-10.32.0.3 is 29; consequently, 10.32.0.4 is always preferred. In the situation where we have only three named service containers the client container is allocated 10.32.0.4, and as that has an equal common prefix length of 29 with addresses 10.32.0.1-10.32.0.3 access to them is distributed randomly as expected.

2opremio commented 8 years ago

In a nutshell: my analysis was wrong, and a deeper look at RFC3484 (which @awh did) confirms that even when having all clients and server containers in the same host, getaddrinfo() can cause a single server to preferred over the others, making Weave's current client, DNS-based load-balancing biased.

And ... there's not much we can do about it. @bdentino Sorry we don't have better news, I will close this, as a duplicate of weaveworks/weave#1245 as you originally suspected.

bboreham commented 8 years ago

I think we should update the guide with at least a note stating what will happen if you use a glibc resolver. And maybe try to find a curl with a different resolver so it will work as expected.

Weave Loadbalance[1] ... [1] does not balance load

rade commented 8 years ago

http://daniel.haxx.se/blog/2012/01/03/getaddrinfo-with-round-robin-dns-and-happy-eyeballs/ says that curl compiled with c-ares Does The Right Thing.

bboreham commented 8 years ago

My reading of the Musl source suggests it does not apply RFC3484 sorting to IPV4 addresses.

2opremio commented 8 years ago

Another workaround would be to use a glib which respects RFC 6724 See https://github.com/weaveworks/weave/issues/1245#issuecomment-158971602

A real-world solution (considering the c-libs used out there) would be to make weavewait tweak gai.conf in the spawned containers, to guarantee that loadbalancing is uniform for IPv4 but this is really messy.

bboreham commented 8 years ago

Is there such a glibc?

I couldn't see how gai.conf could solve our problem, at least if we aren't continuously updating it. How would you do that?

2opremio commented 8 years ago

Is there such a glibc?

I don't know

I couldn't see how gai.conf could solve our problem, at least if we aren't continuously updating it. How would you do that?

We would need to update it. That's why I said it's messy, just like Docker maintains /etc/hosts :S

bboreham commented 8 years ago

Docker only updates /etc/hosts every time a container starts or stops; I couldn't see how to get anything out of gai.conf unless you continue to update it and tweak the precedence rules continuously.

Also note the warning that getting processes to re-read gai.conf "is generally a bad idea"

weaveworks-guides / weave-net-legacy

weave-loadbalance guide only curls single container #133