Open rade opened 9 years ago
I think you would need some kind of distributed database for doing IGMP snooping. Weave would need to detect IGMP join/leave requests and broadcast this information by updating the list of peers subscribed to each multicast group in the database (note that simple gossip updates would not be enough in this case as order is important). Then Weave would then detect all multicast traffic and translate it to unicast...
@inercia
order is important
how do ordinary igmp-aware routers handle that?
I don't know the particular details, but I'd guess that routers detect join/leave messages in ports, serialize that information for updating some kind of multicast forwarding table, and use that table for controlling what multicast traffic is forwarded. I think a distributed router would need to do that serialization in the join/leave information detected in the virtual ports...
@inercia :+1:
Do you have any update on getting the multicast optimized ?
@erandu no, this has not changed
Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers.
I spent some time recently understanding how multicast works in Weave. Elaborating this a bit to give perspective on the problem if some one is interested. For e.g. lets take a cluster with 5 nodes running application container accessing 224.1.2.3
. L3 multicast traffic from the container is mapped to L2 multicast and sent out. In this case 224.1.2.3
maps to L2 multicast IP 01:00:5e:01:02:03
.
L2 multicast traffic from the container through weave
bridge, and veth pairs vethwe-bridge
<-> vethwe-datapath
reach the OVS datapath with below ports
root@ip-172-20-62-75:/home/admin# ovs-dpctl show
system@datapath:
lookups: hit:270 missed:269 lost:3
flows: 15
masks: hit:646 total:2 hit/pkt:1.20
port 0: datapath (internal)
port 1: vethwe-datapath
port 2: vxlan-6784 (vxlan: df_default=false, ttl=0)
Weave programs OVS datapath datapath
with below rule on the node 172.20.39.2
(with peers being 172.20.68.242
, 172.20.73.60
, 172.20.83.7
and 172.20.62.75
). Which will basically result in packet getting broadcasted to all the peers irrespective of fact that there is any container interested in receiving packets to multicast IP 224.1.2.3
in_port(1),eth(src=0a:78:b1:2c:11:ae,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.204s, actions:set(tunnel(tun_id=0x51b960,src=172.20.39.2,dst=172.20.68.242,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5f8960,src=172.20.39.2,dst=172.20.83.7,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x732960,src=172.20.39.2,dst=172.20.56.145,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5ae960,src=172.20.39.2,dst=172.20.62.75,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x44b960,src=172.20.39.2,dst=172.20.73.60,tos=0x0,ttl=64,flags(df,key))),2,0
Similarly each peer is configured to receive packets to 224.1.2.3
irrespective of any local container interested in receiving the traffic.
tunnel(tun_id=0x96051b/0xffffffffffffffff,src=172.20.68.242/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=62:ae:1b:34:4e:76,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:102, bytes:12546, used:0.552s, actions:1,0
tunnel(tun_id=0x96044b/0xffffffffffffffff,src=172.20.73.60/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=ea:f2:dd:f4:f2:c1,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.016s, actions:1,0
tunnel(tun_id=0x9605f8/0xffffffffffffffff,src=172.20.83.7/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=a2:a9:d0:e9:99:c0,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.116s, actions:1,0
tunnel(tun_id=0x9605ae/0xffffffffffffffff,src=172.20.62.75/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=1a:1f:83:fa:c1:fb,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:106, bytes:13038, used:0.245s, actions:1,0
This is sub-optimal as reported in this issue. I will share the notes on possible solution.
As noted IGMP is one of the standards based solution that can leveraged to optimise the multicast traffic. From the IGMP network topology perspective, container running the multicast application forms the hosts, and Weave running on the node takes the responsibility of switch doing IGMP snooping and should implement the host-router interactions as noted in RFC 2326 which basically covers below scenarios
It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.
It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.
@murali-reddy , having more specific rules may be disadvantageous, because rule processing within OVS consumes resources, requires careful table design, etc. Having a single generic rule speeds things up in terms on flow processing. It is a trade-off.
Out of curiosity, is weave still implements IP multicast using broadcast? Thanks
The implementation has not changed, however I don’t think the original wording conveys the correct idea.
If you have a cluster of 3 machines and are running 10 containers that receive multicast, Weave Net will do 2 unicast sends to convey the packets machine-to-machine, then inject the packets as multicast to be received by the 10 containers.
Matthias meant that the machine-to-machine part always reaches all machines, not that it is literally implemented using broadcast.
So like this?
Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers. That is sub-optimal.
Weave does observe IGMP and hence could build up knowledge about which peers contain receivers for specific multicast groups, and then use that knowledge to route packets to just those peers.