Closed t0mmmy90 closed 1 year ago
I took part of the patch, about the fix for the description.
the remainder
- err = nl_cache_pickup(sk, cache);
+ err = nl_cache_pickup_checkdup(sk, cache);
if (err == -NLE_DUMP_INTR) {
is probably correct. I didn't apply it (yet), because for most object types it doesn't make sense (because during a dump there should be no duplicated). So all types would pay the overhead of looking first into the cache. That seems dangerous to cause a regression.
I am a bit hesitant to apply the second part, because I think the route cache has other problems as well (e.g. that ip route replace
replaces a route, and libnl3 doesn't properly handle that; or that the concept of identity is not correct for IPv4 routes (and possibly not for IPv6 routes)).
The route handling (when putting routes into the cache) has many issues in libnl3.
For example, a cache requires some sort of identity for its objects. Specifically, if you can add two routes in kernel (using ip route append
) that only differ by one attribute, then that attribute is part of the identity. For IPv4, NetworkManager does this and for IPv6 it does this. Compare to libnl's identity here. You see that it's wrong. Also NetworkManager's list of what constitutes the identity of a route in kernel is possibly incomplete, at least because kernel coul add a new attribute anytime. The proper solution would be if routes had a unique 64 bit identifier (large enough to never overflow).
Also, the cache relies on events (RTM_NEWROUTE
, RTM_DELROUTE
) that tell that an object was removed. But ip route change
can replace another route (as far as the identity is concerned). All it does is send a RTM_NEWROUTE
message (for the new route) with a NLM_F_REPLACE
flag. I guess, we then have to search the cache for the route that was replaced... and kernel keeps routes sorted, so we need to find the first matching route that was replaced (NetworkManager calls that NM_PLATFORM_IP_ROUTE_CMP_TYPE_WEAK_ID
). That seems easy to get wrong, and is not very efficient (NetworkManager maintains two indexes for all the routes, so that it can find the replaced route fast -- but maintaining an extra index is overhead too).
IPv6 multihop routes (ECMP) are treated special. On the one hand, the next-hop is part of the identity. On the other hand, when you add a second address that only differs by next-hop, then kernel will merge the existing routes. This is relatively easy to solve, by just pretending that multihop routes don't exist. If you have a multiphop route with n
next-hops, then it's really n
single-hop routes. The only twist is, that those routes will be announced in on RTM_NEWROUTE
message. ... and, that ip route change
not only replaces one of these routes, but all of them.
TL;DR: it's a mess. But kernel makes it hard to reasonably place routes in a cache.
Of course it should be fixed. But without first adding sensible unit-tests for all of this, I don't feel confident to merge this (and quite possibly, this patch is not sufficient).
I will add some unit-tests first...
Sorry for the disappointing reply.
Using
nl_cache_pickup_checkdup
while refilling the cache resolves the issue which I reported some time ago #271.I've also swapped descriptions of
nl_cache_pickup
andnl_cache_pickup_checkdup
cause it looks like the second one performs update.