weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Trying to understand weave's connection behaviour #502

Open dpw opened 9 years ago

dpw commented 9 years ago

448 says:

We do want to connect to endpoints of any outbound connections, and also to the weave ports of the endpoints of any inbound connections. Only.

451 purports to address this. But I find the resulting behaviour surprising.

Consider the case where weave is launched on three hosts, A, B, and C, with B and C told to connect to A. For instances, with A having IP address 19.168.122.1:

A# weave launch
B# weave launch 192.168.122.1
C# weave launch 192.168.122.1

Once B and C have successfully connected to A, the topology section of the weave status output on any of the hosts will looks something like:

Peers:
Peer 7a:a4:f9:9f:c6:5d(localhost.localdomain) (v2) (UID 4267405429763087070)
   -> 7a:e3:2c:6e:64:ea(dwragg-laptop) [192.168.122.1:6783]
Peer 7a:e3:2c:6e:64:ea(dwragg-laptop) (v4) (UID 7143912480215104949)
   -> 7a:8b:f2:da:b9:38(ubuntu1404) [172.17.42.1:42757]
   -> 7a:a4:f9:9f:c6:5d(localhost.localdomain) [172.17.42.1:42115]
Peer 7a:8b:f2:da:b9:38(ubuntu1404) (v2) (UID 15498095923640459528)
   -> 7a:e3:2c:6e:64:ea(dwragg-laptop) [192.168.122.1:6783]

In that case, weave status was run on B, and the peers appear in the order B, A, C.

We see that B and C are connected to 192.168.122.1, as expected. And A shows two incoming connections from 172.17.42.1, which is the IP address of the docker bridge. So the IP addresses of B and C are being disguised by docker's connection proxying. Fair enough.

Now, due to #451, B and C will try to connect to 172.17.42.1, but on weave's standard port number, rather than the port numbers reported by A. So they try to connect to themselves, and doing docker logs weave on B or C reveals the following messages, repeated every few minutes:

weave 2015/03/30 16:50:04.516361 ->[172.17.42.1:6783] attempting connection
weave 2015/03/30 16:50:04.517364 ->[172.17.42.1:36367] connection accepted
weave 2015/03/30 16:50:04.517829 ->[172.17.42.1:6783] connection shutting down due to error during handshake: Cannot connect to ourself

weave 2015/03/30 16:50:04.518189 ->[172.17.42.1:36367] connection shutting down due to error during handshake: Cannot connect to ourself

It's while this is not a terrible outcome, it does clutter the logs, and it is unclear to me what the rationale is.

It seems to me that ideally, A would find out the true IP addresses of B and C, and they would connect to each other. But I suppose this would mean running the router container with --net=host? So we rule that possibility out.

But if we know that the weave router cannot trust the remote addresses reported for incoming connections, then what is the point of other peers trying to connect to those addresses, even with the normalized port number? I must be missing something.

At the very least, it would be good to have a full account of weave's connection behaviour and its rationale somewhere, because I am struggling to derive it from the code and github issues.

bboreham commented 9 years ago

When I run weave I don't see the peers come in on the Docker bridge address. Maybe you are running Docker in some nonstandard configuration?

rade commented 9 years ago

what is the point of other peers trying to connect to those addresses, even with the normalized port number?

Consider an existing weave network of A and B. Now a new peer C comes along. Due to firewall restrictions, it can connect to A but not B. But B can connect to C. With the above strategy, B will learn about the inbound connection on A from C (through gossip from A) and establish a connection to C.

dpw commented 9 years ago

When I run weave I don't see the peers come in on the Docker bridge address. Maybe you are running Docker in some nonstandard configuration?

Not as far as I know. Weave is straight from master, with one host being my laptop host OS (fedora), and the two others being VMs (one fedora, one ubuntu). All on docker 1.5, connecting over a linux bridge.

dpw commented 9 years ago

Correction: I have docker 1.4.1 on the host host, the VMs have 1.5. And the host host is the only one shows the docker bridge IPs on incoming connections. I haven't established whether these things are connected.

dpw commented 9 years ago

Ok, I understand how the source addresses of incoming connections on my host are rewritten with the IP address of the docker hub. It's due to the interaction of docker's iptable rules with the libvirt iptable rules that do NATting for the VMs.

I have my libvirt VMs configured to use a NATted bridge on 192.168.122.0/24, which involves these iptables rules:

# iptables -t nat -L -n
[...]
Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
[...]
MASQUERADE  tcp  --  192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
MASQUERADE  udp  --  192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
[...]

Plus I have the following rules introduced by docker for the weave container:

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

[...]

Chain DOCKER (2 references)
target     prot opt source               destination         
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:6783 to:172.17.0.35:6783
DNAT       udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:6783 to:172.17.0.35:6783

When I do weave launch 192.168.122.1 in a VM, and tcpdump on the virbr0 bridge, I see the TCP connection initiated with a packet like:

11:25:33.916115 IP 192.168.122.108.59976 > 192.168.122.1.6783: Flags [S], seq 3506001454, win 29200, options [mss 1460,sackOK,TS val 439645 ecr 0,nop,wscale 7], length 0

The rules on the DOCKER chain then rewrite the destination address to 172.17.0.35.

Then the POSTROUTING rules run, and now match the packet. From iptables-extensions(8) "Masquerading is equivalent to specifying a [SNAT] mapping to the IP address of the interface the packet is going out", which in this case is docker0. So the source address is rewritten to 172.17.42.1. And so the packet on the docker0 bridge becomes:

11:39:06.082342 IP 172.17.42.1.45969 > 172.17.0.35.6783: Flags [S], seq 2825954753, win 29200, options [mss 1460,sackOK,TS val 642686 ecr 0,nop,wscale 7], length 0
rade commented 9 years ago

@dpw what, if anything, is left to do here?

awh commented 9 years ago

At the very least, it would be good to have a full account of weave's connection behaviour and its rationale somewhere, because I am struggling to derive it from the code and github issues.

I second this sentiment - I'll have a stab at producing such an account.

dpw commented 9 years ago

So, the reason I had left this open was as a reminder to decide whether the crazy address rewriting that caused my confusion is a libvirt issue or a docker issue, and to report it, possibly with a fix. But that's a fairly low priority right now. So @awh can take it, otherwise I would have suggested iceboxing it.

awh commented 9 years ago

@dpw there is a wiki page documenting the ConnectionMaker requirements - does it meet the need?