ppp-project / ppp

Paul's PPP Package: PPP daemon and associated utilities | Official GitHub repo: https://github.com/ppp-project/ppp
https://github.com/ppp-project/ppp
Other
388 stars 231 forks source link

rethinking the ppp defaultroute* options #473

Open jkroonza opened 10 months ago

jkroonza commented 10 months ago
          The relevant documentation of the defaultroute options:

   defaultroute
          Add  a  default  route to the system routing tables, using the peer as the gateway,
          when IPCP negotiation is successfully completed.  This entry is  removed  when  the
          PPP  connection  is broken.  This option is privileged if the nodefaultroute option
          has been specified.

   defaultroute-metric
          Define the metric of the defaultroute and only add it if there is no other  default
          route  with the same metric.  With the default value of -1, the route is only added
          if there is no default route at all.

   replacedefaultroute
          This  option  is a flag to the defaultroute option. If defaultroute is set and this
          flag is also set, pppd replaces an existing default  route  with  the  new  default
          route.  This option is privileged.

(An appenddefaultroute option would also be convenient but let's not over-complicate things).

I think we should ignore the IPv6 situation, defaultroute6 technically already violates the standards as I understand them, but it's been stated that providers are messing around, so let's just leave that as is. Also, this could technically be handled more flexibly from ipv6-up. Same for ip-up actually come to think about it.

So there are a few valid combinations of options:

  1. defaultroute
  2. defaultroute+defaultroute-metric
  3. defaultroute+replacedefaultroute
  4. defaultroute+defaultroute-metric+replacedefaultroute

So I believe for these combinations the behaviour should be as follows (Ignoring 3 which is problematic):

  1. If any default route exist, don't add one, if none exist, add with metric 0.
  2. If a default route with the same metric exist, don't add one, else add with relevant metric.
  3. If a default route with the same metric exist, remove it (restore on ipv4 down) and re-add via ppp.

A further problem with replacedefaultroute is that it's possible in linux to have multiple exact same routes at the same metric, so technically even 4 is a problem above:

plastiekpoot [09:01:41] ~ # ip ro sh
default via 192.168.42.1 dev wlp2s0 proto dhcp src 192.168.42.21 metric 2002 
...
plastiekpoot [09:02:05] ~ # ip route append default via 192.168.42.1 dev wlp2s0 proto static src 192.168.42.21 metric 2002
plastiekpoot [09:02:10] ~ # ip ro sh
default via 192.168.42.1 dev wlp2s0 proto dhcp src 192.168.42.21 metric 2002 
default via 192.168.42.1 dev wlp2s0 proto static src 192.168.42.21 metric 2002 
...

So with replacedefaultroute - should we remove all matching default routes or only the first matching one?

Which leads into my logic issue with 3 above. If there is only one other default route it's all good and well, we remove it and re-add later, but what if there are multiple such routes?

I guess the replacedefaultroute option is problematic regardless. We should only ever add routes and remove our own, the system administrator should manage priority via metrics.

So really the question is if a matching route with the same metric exist, do we append or not? Does yet another separate option for that even make sense or should we simply append our route regardless - as such - nuke the replacedefaultroute option entirely, and append another default route at the specified metric and move on, thus reduce the defaultroute options to "defaultroute" and "defaultroute-metric" where if defaultroute is given we ALWAYS APPEND a default route at the specified defaultroute-metric (which then defaults to 0).

If the system administrator wants more control than that - have them rely on ip-(pre-)up or net-pre-up and let them deal with it there?

Whilst the existing logic is definitely broken, I'm not convinced that the PR brings things closer to the documented behaviour either, nor am I sure the documented behaviour is well defined.

Originally posted by @jkroonza in https://github.com/ppp-project/ppp/pull/472#pullrequestreview-1814815012

In addition, I believe the IPv6 defaultroute6 option should be brought in line with whatever it's decided to do with IPv4.

Neustradamus commented 10 months ago

@sthibaul: What is the solution? ^^

jkroonza commented 10 months ago

My opinion is that we should only ever append default route at given metric, never removing any routes not added by pppd.

I think this would simplify the routing related code in pppd quite a bit (for ipv4) at least. And I think a defaultroute6 option should be added.

This leaves two knobs for each protocol:

defaultroute yes/no. defaultroutemetric value.

Perhaps these can be combined as per some other suggestion on original PR I hijacked.

If the sysadmin wants more control than that, deal with it from the hook scripts or using routing daemon (eg, frr) and don't use defaultroute* options from pppd.

One scenario for example that can't be currently handled is adding multiple default routes at the same metric and only failing over if the current primary fails. Let's say I have two uplinks, same specs but different public IPs, I don't want one to take priority over the other and only want to switch if the current active fails.

ppp1 comes up, adds default metric 100 ppp2 comes up, adds default metric 100

Now we have:

default dev ppp1 metric 100
default dev ppp2 metric 100

ppp2 fails, it merely retracts, no change in routing. ppp2 comes back, and is restored to same state as above.

If however ppp1 fails, default is updated to ppp2 by the kernel. Some NAT/MASQUERADEs break, tough, that's the way of it. If you used MASQUERADE rather than SNAT the kernel will sort that out.

ppp1 restores, and re-appends default dev ppp1 metric 100, routing is now:

default dev ppp2 metric 100
default dev ppp1 metric 100

Routing remains via ppp2, not interrupting any connections, which in my world is preferable.

If I always want ppp1 to take priority, then I would set defaultmetric for ppp1 to 50, and defaultmetric for ppp2 to 100 (for example, any values where defaultmetric for ppp1 is less than defaultmetric for ppp2 would work).

We have cases where we need more control, but they're the exception, not the rule. And we can trivially deal with those cases using frr, or in cases where we can't do that with a little more complexity using -up and -down scripts.

sthibaul commented 9 months ago

My opinion is that we should only ever append default route at given metric, never removing any routes not added by pppd

That could make the code simpler indeed.

jkroonza commented 9 months ago

Is this something we can push into 2.5.X or would this change need to wait for 2.6.X?

I don't mind attempting a PR if there is agreement on the approach.

Neustradamus commented 9 months ago

I think in 2.5.x.

jkroonza commented 9 months ago

Do we have consensus?

Bring IPv6 default handling in line by:

jkroonza commented 9 months ago

Actually, given that the historic behaviour was to NOT add the default route at all without defaultroute-metric if ANY other default route exist, whatever the HIGHEST legal metric value is, is likely a more sensible default for the default for defaultroute-metric ... but that would look really funny. Want to simplify the code so scanning for highest existing metric and +1'ing that seems to go in the wrong direction but that's likely the "most right" thing to do?

jkroonza commented 9 months ago

Currently defaultroute6 uses the metric from defaultroute-metric. I think this is a sensible default, however it's very much conceivable that I may want to use different metrics for v4 and v6. So will add a defaultroute6-metric (which defaults to use the value from v4 if not set).

jkroonza commented 9 months ago

Solaris does not use defaultroute-metric at all. I'm not in a position to investigate and look at that, so merely cleaning up the replace error message that's in sifdefaultroute there.

jkroonza commented 9 months ago

Looking at net/ipv4/fib_frontend.c, .fc_nlflags is only set to include NLM_F_APPEND when using NETLINK to add a route, not the ioctl mechanism currently used by pppd. Without this flag the kernel prepends the route at the given metric, not append.

This for me is a problem and would suggest we rather use the netlink mechanism (if available) to add routes, and only fall back to the ioctl method if that fails (with the caveat that routes are prepended).

jkroonza commented 9 months ago

@Neustradamus Would it be acceptable to use netlink to add the routes, and fallback to ioctl should that fail?

Neustradamus commented 9 months ago

@paulusmack, @sthibaul: What do you think?

paulusmack commented 9 months ago

@Neustradamus Would it be acceptable to use netlink to add the routes, and fallback to ioctl should that fail?

Seems fair...

sthibaul commented 9 months ago

sure

jkroonza commented 8 months ago

Thanks. A bit snowed currently, will loop back to this.

Neustradamus commented 3 months ago

@jkroonza: Have you progressed on it?

jkroonza commented 2 months ago

I honestly have not sorry. Things are starting to quiet down, but thing 2.5.1 will have to ship without this.