weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

optimise IP multicast #178

Open rade opened 9 years ago

rade commented 9 years ago

Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers. That is sub-optimal.

Weave does observe IGMP and hence could build up knowledge about which peers contain receivers for specific multicast groups, and then use that knowledge to route packets to just those peers.

inercia commented 9 years ago

I think you would need some kind of distributed database for doing IGMP snooping. Weave would need to detect IGMP join/leave requests and broadcast this information by updating the list of peers subscribed to each multicast group in the database (note that simple gossip updates would not be enough in this case as order is important). Then Weave would then detect all multicast traffic and translate it to unicast...

rade commented 9 years ago

@inercia

order is important

how do ordinary igmp-aware routers handle that?

inercia commented 9 years ago

I don't know the particular details, but I'd guess that routers detect join/leave messages in ports, serialize that information for updating some kind of multicast forwarding table, and use that table for controlling what multicast traffic is forwarded. I think a distributed router would need to do that serialization in the join/leave information detected in the virtual ports...

greenpau commented 9 years ago

@inercia :+1:

erandu commented 7 years ago

Do you have any update on getting the multicast optimized ?

bboreham commented 7 years ago

@erandu no, this has not changed

murali-reddy commented 6 years ago

Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers.

I spent some time recently understanding how multicast works in Weave. Elaborating this a bit to give perspective on the problem if some one is interested. For e.g. lets take a cluster with 5 nodes running application container accessing 224.1.2.3. L3 multicast traffic from the container is mapped to L2 multicast and sent out. In this case 224.1.2.3 maps to L2 multicast IP 01:00:5e:01:02:03.

L2 multicast traffic from the container through weave bridge, and veth pairs vethwe-bridge <-> vethwe-datapath reach the OVS datapath with below ports

root@ip-172-20-62-75:/home/admin# ovs-dpctl show
system@datapath:
    lookups: hit:270 missed:269 lost:3
    flows: 15
    masks: hit:646 total:2 hit/pkt:1.20
    port 0: datapath (internal)
    port 1: vethwe-datapath
    port 2: vxlan-6784 (vxlan: df_default=false, ttl=0)

Weave programs OVS datapath datapath with below rule on the node 172.20.39.2 (with peers being 172.20.68.242, 172.20.73.60, 172.20.83.7 and 172.20.62.75). Which will basically result in packet getting broadcasted to all the peers irrespective of fact that there is any container interested in receiving packets to multicast IP 224.1.2.3

in_port(1),eth(src=0a:78:b1:2c:11:ae,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.204s, actions:set(tunnel(tun_id=0x51b960,src=172.20.39.2,dst=172.20.68.242,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5f8960,src=172.20.39.2,dst=172.20.83.7,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x732960,src=172.20.39.2,dst=172.20.56.145,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5ae960,src=172.20.39.2,dst=172.20.62.75,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x44b960,src=172.20.39.2,dst=172.20.73.60,tos=0x0,ttl=64,flags(df,key))),2,0

Similarly each peer is configured to receive packets to 224.1.2.3 irrespective of any local container interested in receiving the traffic.

tunnel(tun_id=0x96051b/0xffffffffffffffff,src=172.20.68.242/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=62:ae:1b:34:4e:76,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:102, bytes:12546, used:0.552s, actions:1,0
tunnel(tun_id=0x96044b/0xffffffffffffffff,src=172.20.73.60/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=ea:f2:dd:f4:f2:c1,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.016s, actions:1,0
tunnel(tun_id=0x9605f8/0xffffffffffffffff,src=172.20.83.7/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=a2:a9:d0:e9:99:c0,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.116s, actions:1,0
tunnel(tun_id=0x9605ae/0xffffffffffffffff,src=172.20.62.75/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=1a:1f:83:fa:c1:fb,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:106, bytes:13038, used:0.245s, actions:1,0

This is sub-optimal as reported in this issue. I will share the notes on possible solution.

murali-reddy commented 6 years ago

As noted IGMP is one of the standards based solution that can leveraged to optimise the multicast traffic. From the IGMP network topology perspective, container running the multicast application forms the hosts, and Weave running on the node takes the responsibility of switch doing IGMP snooping and should implement the host-router interactions as noted in RFC 2326 which basically covers below scenarios

It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.

greenpau commented 6 years ago

It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.

@murali-reddy , having more specific rules may be disadvantageous, because rule processing within OVS consumes resources, requires careful table design, etc. Having a single generic rule speeds things up in terms on flow processing. It is a trade-off.

ceclinux commented 3 years ago

Out of curiosity, is weave still implements IP multicast using broadcast? Thanks

bboreham commented 3 years ago

The implementation has not changed, however I don’t think the original wording conveys the correct idea.

If you have a cluster of 3 machines and are running 10 containers that receive multicast, Weave Net will do 2 unicast sends to convey the packets machine-to-machine, then inject the packets as multicast to be received by the 10 containers.

Matthias meant that the machine-to-machine part always reaches all machines, not that it is literally implemented using broadcast.

RobKoerts commented 3 years ago

image

So like this?