jech / babeld

The Babel routing daemon
http://www.irif.fr/~jch/software/babel/
MIT License
385 stars 92 forks source link

Added redistribute live and check_xroutes #114

Open Dando-Real-ITA opened 8 months ago

Dando-Real-ITA commented 8 months ago

I was unable to reliably make the fake default route IPv4 and IPv6 on dev lo work. For example, sometimes babeld on the distributing server was setting the xroute metric to unreachable even if it was ok, and only at one of ipv4 or ipv6. Result was that the client would get, with another default route retracted, one of ipv4 or ipv6 completely unreachable

I am not sure what was causing the metric not reflect the state, but I suppose it was having 2 default routes on the same master table

Using redistribute live update + check_xroutes I was able to have consistent results of refmetric on the client with any combination or order of default routes working or not

jech commented 4 months ago

Thanks for the PR. Sorry for the delay, you submitted it at a bad time and then it fell into the cracks.

At first sight it looks good, please give me a few days to review it carefully.

Dando-Real-ITA commented 4 months ago

No problem, if interested after these commits I added a multi default support, however it is a more pervasive change and to work requires policy rules and to install the multiple default routes in separate tables

Also have a few changes to support table id > 255 and a delay on first update when using hmac to after the cycle is running and the handshake is completed

I am on hold currently on the project but I will use it to upgrade our network to multi-isp clos model

jech commented 4 months ago

I've applied the first two patches of your PR.

In the third patch, I have a problem: why is applying the install filter conditional on newsrc being non-null? I think that's a bug, we should be applying the install filter unconditionally in change_route and flushing the route if the install_filter causes it to be discarded.

The fourth patch is incomplete. It only works in a very specific case (the filter already exists and only the add_metric field is being changed), which is fine, no problem only doing the specific case that solves your problem. However, if the requested operation could not be performed, it does not return an error message to the user.

As to the last patch, I don't understand why it is useful. It also generates an error message that does not fit the established pattern (error messages start with either "no" or "bad").

Dando-Real-ITA commented 4 months ago

3rd patch: change_route is called by multiple code paths that do not change the src: install_route, uninstall_route, change_route_metric. All these do not change the pref_src, and the first run of the install_filter in change_route gets pref_src, and by default newpref_src = pref_src;. This means that in these cases, the code works as before patch.

Only in switch_routes there is a new source, which causes to reevaluate the install_filter to retrieve the newpref_src

4th patch: Yes, I handled not finding the filter as a noop, the update filter was meant for that very specific case, but I could add a fail return message

5th patch: It is needed to send the metric change update when the route is changed to deny. Before patch, redistribute filters are fixed once the configuration is read, so check_xroutes when it finds a route with deny, ignores it and no further processing happens, including sending updates to peers. With the patch, a route could switch from having a valid metric to be deny, in this case a retraction must be sent to peers. Only in this case, which is manually triggered by the new command, the INFINITY metric routes are not ignored so that an update can be sent. All other calls to check_xroutes keep the default behaviour. I can update the message

jech commented 3 months ago

3rd patch: change_route is called by multiple code paths that do not change the src: install_route, uninstall_route, change_route_metric. All these do not change the pref_src, and the first run of the install_filter in change_route gets pref_src, and by default newpref_src = pref_src;. This means that in these cases, the code works as before patch.

Agreed, but I think that the existing code is buggy in case of a non-trivial install filter. I could be wrong, but I think we need to invoke install_filter unconditionally in switch_routes.

4th patch: Yes, I handled not finding the filter as a noop, the update filter was meant for that very specific case, but I could add a fail return message

You really need to add proper error handling there.

5th patch: It is needed to send the metric change update

Hmm.

First of all, it's not strictly needed, since the routes will still expire. However, it might take some time.

I'm not keen on adding a manual trigger. Can we work things out so that the update happens automatically?

Dando-Real-ITA commented 3 months ago

Agreed, but I think that the existing code is buggy in case of a non-trivial install filter. I could be wrong, but I think we need to invoke install_filter unconditionally in switch_routes.

install_filter in change_route is always executed on the babel_route *route, and again if also source *newsrc is passed. What other use case do you mean?

You really need to add proper error handling there.

Done

Hmm.

First of all, it's not strictly needed, since the routes will still expire. However, it might take some time.

I'm not keen on adding a manual trigger. Can we work things out so that the update happens automatically?

A first option could be to trigger CONFIG_ACTION_CHECK_XROUTES directly with the redistribute update. However in my use case I change 4 redistribute before updating the xroutes, and would trigger the check_xroutes 4 times instead of 1:

echo "
redistribute ip 0.0.0.0/0 eq 0 $METRIC
redistribute ip 0.0.0.0/0 eq 0 proto 3 $METRIC
redistribute ip ::/0      eq 0 $METRIC
redistribute ip ::/0      eq 0 proto 3 $METRIC
check_xroutes
" | socat - UNIX-CONNECT:/var/run/babeld.sock 1>&2 > /dev/null

Another option could be to remove the infinity check altogether from: https://github.com/jech/babeld/blob/130366fba2efbe41e9d6f4052b352ea74279925e/xroute.c#L491 Then at kernel-check-interval timeout the new metric would be detected and update sent instead of ignored

The kernel default routes are never touched, as they are used by a timed external script to check if internet is available, only the redistribution parameter in babel is changed. So there is no kernel netlink update to use to trigger an update.