svinota / pyroute2

Python Netlink and PF_ROUTE library — network configuration and monitoring
https://pyroute2.org/
Other
951 stars 244 forks source link

NDB Kernel not sending RTM_DELROUTE #1188

Open svenauhagen opened 6 months ago

svenauhagen commented 6 months ago

If a route is using neighbours as gateway and the neighbour is deleted, the kernel does not send a RTM_DELROUTE and therefore the NDB cache is incorrect. How to reproduce:

run ip monitor and in a second window do:

ip a a 172.16.1.1/24 dev veth1 ip nexthop add id 100 via 172.16.1.2 dev veth1 ip route add 172.16.101.0/24 nhid 100 ip nexthop del id 100

replace IP and dev with your local settings. The kernel will send a RTM_DELNEXTHOP but no RTM_DELROUTE.

For IPv6 the behaviour can be controlled with a sysctl skip_notify_on_dev_down but not in IPv4. I guess we would need to listen for RTM_DELNEXTHOP remove them from the neighbourhood table and if it is the last entry for a route also delete that route.

What do you think?

Best Sven

crosser commented 6 months ago

That's right, you need to subscribe to RTMGRP_LINK and remove all routes with oif matching the disappearing link. At least, that's what we are doing, and it seems to work.

svenauhagen commented 6 months ago

@crosser thank you for that, that makes sense. There is another case though I am running into. When the interface ipaddress is deleted any neighbours that can not be reached are also deleted.

So to get back to my example:

ip a a 172.16.1.1/24 dev veth1 ip nexthop add id 100 via 172.16.1.2 dev veth1 ip route add 172.16.101.0/24 nhid 100 ip a d 172.16.1.1/24 dev veth1

this will also trigger a RTM_DELNEXTHOP but no RTM_DELROUTE. Do you have that covered by any chance?

crosser commented 6 months ago

We did not run into this use case (yet?), and not handling it.

crosser commented 6 months ago

@svenauhagen your test case looks odd to me. The last command is ip r d 172.16.1.1/24 dev veth1, should it be ip a d 172.16.1.1/24 dev veth1?

svenauhagen commented 6 months ago

@crosser you are correct, that is a copy and paste error. I will fix it above in my post

svenauhagen commented 6 months ago

I pushed a PR for the case that the last address is removed from an interface with multipath routes