ovn-org / ovn

Open Virtual Network
Apache License 2.0
501 stars 245 forks source link

Multiple VMs on the same Openstack hypervisor subscribed to the same IGMP group eventually drop out of the same multicast group #126

Open dparv opened 2 years ago

dparv commented 2 years ago

Multiple instances in Openstack (running ovn-20.03.2) subscribe to an IGMP group to receive externally or internally generated multicast traffic.

mcast_flood_reports is enabled because there is a Mellanox switch outside of the environment keeping track of IGMP group subscriptions with the following configurations:

        IGMP snooping globally: enabled
        IGMP default version for new VLAN: V2
        IGMP snooping operationally: enabled
        Proxy-reporting globally: disabled
        Last member query interval: 1 seconds
        Mrouter timeout: 125 seconds
        Port purge timeout: 260 seconds
        Report suppression interval: 5 seconds
        IGMP snooping unregistered multicast: forward-to-mrouter-ports

When a single VM is subscribed to the IGMP group, it will send out igmp v2 report and there everything is working good, however when a secondary VM on the same hypervisor subscribes to the same IGMP group - both VMs will start sending out igmp v2 reports towards the IGMP group, however RFC 2236 states:

If the host receives another host's Report (version 1 or 2) while it has a timer running, it stops its timer for the specified group and does not send a Report, in order to suppress duplicate Reports.

This means that randomly VMs will eventually drop of the multicast group and stream, because when one sends am igmp v2 report it gets received by the other, which will not send that out reports and eventually time out (260 seconds). A temporary workaround is easily done with iptables on both workloads via:

iptables -A INPUT -p igmp ! -d 224.0.0.1 -j DROP

Is there an option for ovn (via openflows) to prevent flooding the subscribed VM ports with igmp v2 reports to prevent this timeout?

aserdean commented 2 years ago

/CC @dceara

dceara commented 2 years ago

Is adding an egress pipeline ACL an option? Basically implementing your iptables workaround directly in OVN.

Something like:

ovn-nbctl pg-add pg_drop_igmp vm1 vm2 ...
ovn-nbctl acl-add pg_drop_igmp to-lport 32767 'outport=@pg_drop_igmp && igmp && ip4.dst == 224.0.0.1' drop 
dparv commented 2 years ago

If that has to be done on specific VMs it will not work as a generic solution for the whole cloud. I can see there is the option to acl-add on a switch, which sounds like a better idea, at least has to be done once per provider network, but is there an option to specify ip4.dst != 224.0.0.1, as that I think what we need here - to drop everything coming on the VM port that is not destined for 224.0.0.1, so it will drop all IGMP v2 reports to X,Y,Z groups, but it will allow the querier to 224.0.0.1, e.g. 10.170.96.1 > 224.0.0.1: igmp query v2

Also is it possible to match only types 0x12 (report V1), 0x16 (report V2), 0x22 (report V3) of IGMP with OVN?

dparv commented 2 years ago

Hey, so we just tried doing this:

ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32767 'igmp && ip4.dst == 224.0.0.1' allow
ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32766 'igmp' drop

which results in this ACL:

  to-lport 32767 (igmp && ip4.dst == 224.0.0.1) allow
  to-lport 32766 (igmp) drop

and that seems to work.

Is there an option to apply this as a generic rule to apply to all logical switches automatically?

dceara commented 2 years ago

If that has to be done on specific VMs it will not work as a generic solution for the whole cloud. I can see there is the option to acl-add on a switch, which sounds like a better idea, at least has to be done once per provider network, but is there an option to specify ip4.dst != 224.0.0.1, as that I think what we need here - to drop everything coming on the VM port that is not destined for 224.0.0.1, so it will drop all IGMP v2 reports to X,Y,Z groups, but it will allow the querier to 224.0.0.1, e.g. 10.170.96.1 > 224.0.0.1: igmp query v2

You're right, it should be != 224.0.0.1

Also is it possible to match only types 0x12 (report V1), 0x16 (report V2), 0x22 (report V3) of IGMP with OVN?

At a first glance this doesn't seem possible. I might be wrong though.

Hey, so we just tried doing this:

ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32767 'igmp && ip4.dst == 224.0.0.1' allow
ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32766 'igmp' drop

which results in this ACL:

  to-lport 32767 (igmp && ip4.dst == 224.0.0.1) allow
  to-lport 32766 (igmp) drop

and that seems to work.

I'm assuming you have mcast_flood=true configured on the localnet port otherwise this will break the case when the external host registers for IP multicast traffic originated from inside the cluster.

Is there an option to apply this as a generic rule to apply to all logical switches automatically?

Unfortunately not.

dparv commented 2 years ago

Re-worked the rule as follows, but it looks like this drops all the outbound packets for IGMP and UDP as well:

ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32767 'igmp && ip4.dst != 224.0.0.1' drop

I can see those leaving the VM and the tap interfaces on both, which is now the expected behavior:

10.170.96.7 > 239.55.55.55: igmp v2 report 239.55.55.55

and

10.170.96.18 > 239.55.55.55: igmp v2 report 239.55.55.55

but then they don't get received in the querier switch, so they get dropped along the way, removing the acl seems to fix this, so this is for some reason dropping the outbound IGMP and UDP, which makes no sense.

Maybe it's the priority that has to be changed?

dceara commented 2 years ago

Re-worked the rule as follows, but it looks like this drops all the outbound packets for IGMP and UDP as well:

ovn-nbctl acl-add neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 to-lport 32767 'igmp && ip4.dst != 224.0.0.1' drop

The "to-lport" is from the logical switch perspective. This essentially translates to "all IGMP traffic with destination IP != 224.0.0.1 going out any logical switch port attached to neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 should be dropped".

I'm guessing the neutron-de186878-bd21-48a7-a4c5-1d81227c9f48 switch also has a localnet port connecting it to the provider network where the physical querier switch is.

So IGMP reports from VM:

VM ---> neutron-LogicalSwitch --> localnet-port --> provider network

Will get dropped before being sent out on the localnet port because they get matched by the ACL.

That's why in my initial suggestion I had an extra match in the ACL, to restrict the egress logical port:

outport=@pg_drop_igmp && igmp ...
dparv commented 2 years ago

Okay, just tried the commands you proposed 1:1 and what happens is - the IGMP v2 report from a single VM still floods both ports and the second VM gets suppressed and drops traffic, this is the pg_drop_igmp ACL rules:

$# ovn-nbctl acl-list pg_drop_igmp 
  to-lport 32767 (outport=@pg_drop_igmp && igmp && ip4.dst != 224.0.0.1) drop
dparv commented 2 years ago

Some progress, outport==@pg_drop_igmp instead of outport=@pg_drop_igmp works! So it's the double ==.

Now the question is: can we use outport!=provnet-b44ae0a3-271d-4c22-9d7f-9312d486c2f4 or the provnet UUID? Should there be the @provnet-b44ae0a3-271d-4c22-9d7f-9312d486c2f4 or @UUID?

dceara commented 2 years ago

Some progress, outport==@pg_drop_igmp instead of outport=@pg_drop_igmp works! So it's the double ==.

Oops, sorry for the typo. You're right.

Now the question is: can we use outport!=provnet-b44ae0a3-271d-4c22-9d7f-9312d486c2f4 or the provnet UUID? Should there be the @provnet-b44ae0a3-271d-4c22-9d7f-9312d486c2f4 or @uuid?

Unfortunately you can only test a logical port for equality, "!=" is not allowed. If you'd try it you'd get a log like the following in ovn-controller:

lflow|WARN|error parsing match "outport != @pg2 && igmp && ip4.dst == 224.0.0.1": Nominal field outport may only be tested for equality (taking enclosing `!' operators into account).

So you need to list all the VM ports in the @pg_drop_igmp port group.

In general, the match syntax is (applies to inport too):

'outport == "<logical-port-name>" ..'
'outport == @<port-group-name> ..'
dparv commented 2 years ago

Finally figured it out! This works as applied on the switch directly and will allow everything to the provnet:

# ovn-nbctl acl-list neutron-de186878-bd21-48a7-a4c5-1d81227c9f48
  to-lport 32767 (outport=="provnet-b44ae0a3-271d-4c22-9d7f-9312d486c2f4") allow
  to-lport 32766 (igmp && ip4.dst != 224.0.0.1) drop
dceara commented 2 years ago

Nice!

On the long term we should probably change OVN to either: a. Only forward reports towards ports where mrouters are connected (where a query was received on) b. Or have a static, per logical_switch_port config, to allow/deny flooding of reports on a port.

Option "a" is more generic but we'd need ovn-controller to manage mrouter port expiration. "b" is not really flexible but might be enough in this context.

What do you think?

dparv commented 2 years ago

Config options are always welcome, as it provides more flexibility on how you want to use it.

However RFC4541 states 1:1 option a):

A snooping switch should forward IGMP Membership Reports only to
those ports where multicast routers are attached.

From administrator perspective, where one can run ovn-nbctl commands, b) is fine, however from Openstack perspective for b): things need to be automated by the CMS... so some changes in neutron/ports to support the config option are in order.

I guess the best approach might be to go with a) and follow the RFC.

dceara commented 2 years ago

@dparv Thinking more about it, to implement the RFC compliant solution is quite a bit of change in OVN. I was wondering if an option "c" is good enough instead:

This still needs the CMS to configure the mcast_flood_reports=true option but this can be done unconditionally:

Would this work for you too? CC @umago

dceara commented 2 years ago

After discussion with @umago it turns out we already had a bug reported for this downstream: https://bugzilla.redhat.com/show_bug.cgi?id=1933990#c3

I'll go ahead and implement the fix in core OVN and we'll then change the mcast_flood_reports setting to false in neutron for non-localnet ports.

dparv commented 2 years ago

We already have set mcast_flood_reports to true everywhere so that IGMP queries outside of Openstack reach down VMs.

The proposal sounds like it will either allow or block block all IGMP traffic this way on a logical switch port, even the one coming from outside of Openstack, for example querier that will send IGMP query from an external port won't reach the VM ports internally?

I think there is the need to filter by type of IGMP depending on where it's coming from/going to, for example:

So any way to filter out IGMP traffic reports (types 0x12, 0x16) only. I think 0x22 for IGMP v3 is OK to be allowed, as there is no report suppression mechanism available in v3.

dceara commented 2 years ago

@dparv I posted a patch series that implements (or tries to) the RFC compliant solution of forwarding queries and reports: https://patchwork.ozlabs.org/project/ovn/list/?series=308096&state=*

If you have time, it would be great if you could try it out, thanks!