Open Dando-Real-ITA opened 8 months ago
Thanks for the PR. Sorry for the delay, you submitted it at a bad time and then it fell into the cracks.
At first sight it looks good, please give me a few days to review it carefully.
No problem, if interested after these commits I added a multi default support, however it is a more pervasive change and to work requires policy rules and to install the multiple default routes in separate tables
Also have a few changes to support table id > 255 and a delay on first update when using hmac to after the cycle is running and the handshake is completed
I am on hold currently on the project but I will use it to upgrade our network to multi-isp clos model
I've applied the first two patches of your PR.
In the third patch, I have a problem: why is applying the install filter conditional on newsrc being non-null? I think that's a bug, we should be applying the install filter unconditionally in change_route and flushing the route if the install_filter causes it to be discarded.
The fourth patch is incomplete. It only works in a very specific case (the filter already exists and only the add_metric field is being changed), which is fine, no problem only doing the specific case that solves your problem. However, if the requested operation could not be performed, it does not return an error message to the user.
As to the last patch, I don't understand why it is useful. It also generates an error message that does not fit the established pattern (error messages start with either "no" or "bad").
3rd patch: change_route
is called by multiple code paths that do not change the src: install_route
, uninstall_route
, change_route_metric
. All these do not change the pref_src
, and the first run of the install_filter
in change_route
gets pref_src
, and by default newpref_src = pref_src;
. This means that in these cases, the code works as before patch.
Only in switch_routes
there is a new source, which causes to reevaluate the install_filter
to retrieve the newpref_src
4th patch: Yes, I handled not finding the filter as a noop, the update filter was meant for that very specific case, but I could add a fail return message
5th patch: It is needed to send the metric change update when the route is changed to deny. Before patch, redistribute filters are fixed once the configuration is read, so check_xroutes
when it finds a route with deny
, ignores it and no further processing happens, including sending updates to peers. With the patch, a route could switch from having a valid metric to be deny
, in this case a retraction must be sent to peers. Only in this case, which is manually triggered by the new command, the INFINITY
metric routes are not ignored so that an update can be sent. All other calls to check_xroutes
keep the default behaviour. I can update the message
3rd patch: change_route is called by multiple code paths that do not change the src: install_route, uninstall_route, change_route_metric. All these do not change the pref_src, and the first run of the install_filter in change_route gets pref_src, and by default newpref_src = pref_src;. This means that in these cases, the code works as before patch.
Agreed, but I think that the existing code is buggy in case of a non-trivial install filter. I could be wrong, but I think we need to invoke install_filter unconditionally in switch_routes.
4th patch: Yes, I handled not finding the filter as a noop, the update filter was meant for that very specific case, but I could add a fail return message
You really need to add proper error handling there.
5th patch: It is needed to send the metric change update
Hmm.
First of all, it's not strictly needed, since the routes will still expire. However, it might take some time.
I'm not keen on adding a manual trigger. Can we work things out so that the update happens automatically?
Agreed, but I think that the existing code is buggy in case of a non-trivial install filter. I could be wrong, but I think we need to invoke install_filter unconditionally in switch_routes.
install_filter
in change_route
is always executed on the babel_route *route
, and again if also source *newsrc
is passed. What other use case do you mean?
You really need to add proper error handling there.
Done
Hmm.
First of all, it's not strictly needed, since the routes will still expire. However, it might take some time.
I'm not keen on adding a manual trigger. Can we work things out so that the update happens automatically?
A first option could be to trigger CONFIG_ACTION_CHECK_XROUTES
directly with the redistribute update. However in my use case I change 4 redistribute before updating the xroutes, and would trigger the check_xroutes
4 times instead of 1:
echo "
redistribute ip 0.0.0.0/0 eq 0 $METRIC
redistribute ip 0.0.0.0/0 eq 0 proto 3 $METRIC
redistribute ip ::/0 eq 0 $METRIC
redistribute ip ::/0 eq 0 proto 3 $METRIC
check_xroutes
" | socat - UNIX-CONNECT:/var/run/babeld.sock 1>&2 > /dev/null
Another option could be to remove the infinity check altogether from: https://github.com/jech/babeld/blob/130366fba2efbe41e9d6f4052b352ea74279925e/xroute.c#L491
Then at kernel-check-interval
timeout the new metric would be detected and update sent instead of ignored
The kernel default routes are never touched, as they are used by a timed external script to check if internet is available, only the redistribution parameter in babel is changed. So there is no kernel netlink update to use to trigger an update.
I was unable to reliably make the fake default route IPv4 and IPv6 on dev lo work. For example, sometimes babeld on the distributing server was setting the xroute metric to unreachable even if it was ok, and only at one of ipv4 or ipv6. Result was that the client would get, with another default route retracted, one of ipv4 or ipv6 completely unreachable
I am not sure what was causing the metric not reflect the state, but I suppose it was having 2 default routes on the same master table
Using redistribute live update + check_xroutes I was able to have consistent results of refmetric on the client with any combination or order of default routes working or not