Open dparv opened 2 years ago
Thanks for reporting the issue.
@dceara Do you have any comments for the suggestions in northd here ?
@dparv I think you're right. Could you, please, post a patch to the mailing list with your change?
If possible, it would be nice if you could also add a test for this, please see AT_SETUP([IGMP snoop/querier/relay])
in https://github.com/ovn-org/ovn/blob/main/tests/ovn.at for an example. Alternatively, I can also try to help with writing the test, just let me know.
By the way, I am not sure if that is the right approach. Ultimately what needs to happen is when an internal VM (can also be the same sending the traffic) subscribes to a the feed generated within the cloud this flow appears:
cookie=0x54cb0442, duration=30.708s, table=27, n_packets=9215, n_bytes=11440562, idle_age=0, priority=90,ip,metadata=0x26,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,nw_dst=239.55.55.55 actions=set_field:0x8006->reg15,resubmit(,32)
Then the receiver machine outside the cloud starts to get that traffic, but when the internal VM leaves that IGMP group, the external receiver immediately stops.
I see in main branch that ovn_igmp_group_aggregate_ports is a bit re-worked and it now does:
if (igmp_group->datapath->n_localnet_ports) {
instead of
if (igmp_group->datapath->localnet_port) {
so maybe this needs to be a loop over all localnet_ports and check each for not being mcast_info.flood
then add them one by one, like so:
diff --git a/northd/northd.c b/northd/northd.c
index 2fb0a93c2..5cb24f314 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4545,10 +4545,15 @@ ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
}
if (igmp_group->datapath->n_localnet_ports) {
- ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
- &igmp_group->mcgroup,
- igmp_group->datapath->localnet_ports,
- igmp_group->datapath->n_localnet_ports);
+ for (size_t i = 0; i < igmp_group->datapath->n_localnet_ports; i++) {
+ if (!igmp_group->datapath->localnet_ports[i]->mcast_info.flood) {
+ ovn_multicast_add_ports(mcast_groups,
+ igmp_group->datapath,
+ &igmp_group->mcgroup,
+ &igmp_group->datapath->localnet_ports[i],
+ 1);
+ }
+ }
}
}
By the way, I am not sure if that is the right approach. Ultimately what needs to happen is when an internal VM (can also be the same sending the traffic) subscribes to a the feed generated within the cloud this flow appears:
cookie=0x54cb0442, duration=30.708s, table=27, n_packets=9215, n_bytes=11440562, idle_age=0, priority=90,ip,metadata=0x26,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,nw_dst=239.55.55.55 actions=set_field:0x8006->reg15,resubmit(,32)
Then the receiver machine outside the cloud starts to get that traffic, but when the internal VM leaves that IGMP group, the external receiver immediately stops.
You raise a good point here.
I'm guessing the external receiver also registered for the same multicast group, right?
If so, why didn't we add the localnet port in the SB.IGMP_Group corresponding to that multicast group? It seems to me that this is the real issue. Along side with the fact that ovn_igmp_group_aggregate_ports() will blindly merge SB.IGMP_Group records from multiple chassis for the same multicast group, without checking for port duplicates.
Or am I missing something else?
The external receiver is subscribed to this group, yes.
I am really no ovn expert here.. but how would ovn know this IGMP group exists - meaning if no VMs are subscribed to it, then how would it add the localport of the VM generating the traffic?
Easiest workaround I have found far is to just subscribe the VM that generates the traffic to the same IGMP group it multicasts it to, but that's 'in-workload' solution, not something that can happen on the cloud networking.
@dparv Would it be possible to attach or share the NB and SB databases? I'll try to look into it but it makes it easier if we can test on the exact same network topology. It would also really help to know the sender VM id or logical_switch_port.
Thanks!
@dceara
I have attached a zip with 2 scenarios:
subscribed - 1 internal VM streaming ffmpeg on port 5002 IGMP group 239.55.55.55 and subscribed to the same group; external receiver gets the feed
not-subscribed - 1 internal VM streaming ffmpeg on port 5002 IGMP group 239.55.55.55 and not subscribed to the same group; external receiver doesn't get the feed
In both cases the external receiver is s subscribed to the 239.55.55.55 IGMP group.
The generating traffic VM inside the cloud is: 10.170.96.18/27 The outside receiver is 10.170.96.17/27 Logical switch is f8e4ee08-9635-4f1b-9787-304ff9176a57 Logical switch port is: 51b8d609-23ab-4524-8c25-17e78d6200eb
@dceara Can you please take a look at the submitted patch ?
@aserdean @numansiddique I apologize, I missed this before, I'll review it these days.
Commit d02fb560aa35c4931d40000e8ca4c0e4105299b7 (1)(2) to Neutron disabled 'mcast_flood' on localnet ports, in order to avoid multicast packet duplication. This introduces a regression because outbound multicast traffic from VMs is no longer sent to the provider network unless it is part of a multicast group that OVN has snooped. This makes it impossible for something external to OpenStack to subscribe to a multicast group running on a VM (in OpenStack) unless another VM (in OpenStack) subscribes to it first (because the multicast traffic never leaves the OVS).
In my opinion https://review.opendev.org/c/openstack/neutron/+/797418 has to be reverted and a change implemented in ovn itself.
Regarding the change in OVN, ovn_igmp_group_aggregate_ports() has to be changed to only add the localnet port if
mcast_flood
is false, eg:FTR: running ovn-20.03.2
1 https://review.opendev.org/c/openstack/neutron/+/797418 2 https://bugs.launchpad.net/neutron/+bug/1933207