osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.66k stars 699 forks source link

evpn: scope mac-mobility handling to MAC-VRF of the route #2805

Closed Tuetuopay closed 5 months ago

Tuetuopay commented 6 months ago

The mac mobility code used lookup tables to find the routes to the MAC address. It only considered the MAC itself, and not the associated MAC-VRF. This led to the complexity still being way too high when one MAC is present in all MAC-VRFs, like a virtual router or an anycast gateway.

This patch now considers the MAC-VRF for a route by looking up its route targets, and considers them all if multiple are present. This way, the lookup for routes that will participate in mac-mobility will be limited to the MAC-VRF. This allows to not blow up the CPU when a MAC address is present in a lot of MAC-VRFs.

The biggest behavioral change is now how mac-mobility is sharded. For example, it can now have the same MAC in multiple MAC-VRFs on different PEs. This is quite useful in highly multi-tenant environments where control on the MAC addresses is nigh impossible. This is a net fix of the behavior where MAC-VRFs cannot affect each other. For reference, FRRouting does scope Mac Mobility to the MAC-VRF.

But the main reason this was done is to fix the quadratic complexity that comes when there are many times the same MAC in many different MAC-VRFs, by not needing to iterate through all the routes for other unrelated MAC-VRFs.

A quick benchmark was done by adding type-2 routes through the gRPC API. For 20k routes, on my laptop (Ryzen 5 PRO 4650U):

It is to be noted that, without the patch, the whole route ingestion will be throttled by type-2 route ingestion.

Note: this PR is both a performance and a correctness fix. It is built atop #2804 which is another correctness fix, hence the two commits. Please keep the comments for the first commit in the other pull request

fujita commented 6 months ago

Looks like the evpn test failed.

Tuetuopay commented 6 months ago

Looks like the evpn test failed.

@fujita indeed. Since you're here I would like for your input on the implementation. I am not sure about the behavior when one of the RTs of the route is for an IP-VRF and not just a MAC-VRF (i.e. a route when doing type-5 routing for e.g. a router's mac).

I will fix the test, thanks!

Tuetuopay commented 6 months ago

@fujita the evpn tests have been fixed. there was indeed a logic error in my patch due to some place I missed accessing the destinations directly (TableManager directly writing to Table's Destinations without going through table, which is bad isolation). I moved accesses to Destination to Table itself.

The graceful restart tests do fail which I don't understand since I did not touch this part of the code, especially since it's on an address family not affected by the patch (ipv4/6 unicast vs evpn). Those appear to be flaky since when running them locally they don't fail more often than they do. But they still do occasionally.

Tuetuopay commented 5 months ago

rebased

fujita commented 5 months ago

Thanks!