opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.36k stars 754 forks source link

system: switch default gateway by optionally selected group #2279

Closed mimugmail closed 5 years ago

mimugmail commented 6 years ago

Hi,

is it possible to set a metric in routes or better in gateway config? The problems is when you have more than 2 lines you can not set an order which line follows after primary down (with gateway switching).

With Cisco you can set as many default gateways and use metrics to set the priority.

Is this also possible with FreeBSD / OPN?

fichtner commented 6 years ago

I think route metrics are not in FreeBSD https://github.com/opnsense/core/issues/123

mimugmail commented 6 years ago

Hm, OK, Not used by recent kernels. In Gateway - Advanced you can set a priority, would it be possible to patch the Script for adding routes to respect this setting?

fichtner commented 6 years ago

@mimugmail what exactly do you mean?

mimugmail commented 6 years ago

Via Advanced in gateways you can set a priority. When you have 3 you can set them 1 to 3 and when failover occurs, the script changing the routes could (if possible) check if the gateway which is down is a lower number, and if yes is there another gateway with a lower number, then use this as next default gateway, if not, self default gateway.

fichtner commented 6 years ago

You mean „weight“?

Yes, we could use that during default gateway switching. But the default gateway will be highest in priority, no matter the weight value.

AFAIK, the weight is for load balancing on multi-wan to distribute traffic using a non 50-50 distribution.

On 1. Apr 2018, at 21:56, Michael notifications@github.com wrote:

Via Advanced in gateways you can set a priority. When you have 3 you can set them 1 to 3 and when failover occurs, the script changing the routes could (if possible) check if the gateway which is down is a lower number, and if yes is there another gateway with a lower number, then use this as next default gateway, if not, self default gateway.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mimugmail commented 6 years ago

Oh, weight, correct. But that doesn't matter since in setups were weight is activly used, you don't care which default gateway the system has. But when you have a 1G, a 100M and a LTE line you don't want to see the firewall switch from 1G to LTE. Sure, the default gw is highest and there can only be one at a time. But it would be good to have a choice :)

In the long term (20.1, 20.7) I'd love to see the way Cisco does this with monitoring groups and tracked interfaces. 👍

fichtner commented 6 years ago

Okay, i think we’re on the same page then. 😊

mimugmail commented 6 years ago

@AdSchellevis this was the topic we talked via IRC lately :)

AdSchellevis commented 6 years ago

@mimugmail ok, to summarize what we've discussed (if I remember correctly).

If @fichtner agrees, we could add a marker for "backup default" with a weight and use the weights to sort the gateways. This would allow us to prioritise default gateway switching. Ideally we would like to use policy based routing for local traffic too, but that is more of a long run solution. agreed? (if so, I'm offering todo the work for this item)

fichtner commented 6 years ago

yes, but I want to see how https://github.com/opnsense/core/commit/da4d25e629 works out on 18.7... it's preliminary work to exclude certain gateways from default gateway switching. it's not practical to mark a gateway down and should probably be a separate option, but if this separate option is what we talk about here with a priority setting that would be best.

fichtner commented 6 years ago

(it's correct to exclude down gateways from switching, I just mean it's impractical if you don't want it down and still not use it for default switching)

mimugmail commented 6 years ago

Full agree with this! :) (also with fichtner comments couple of sec's ago)

Just to have it here: The need for local PBR traffic would cover all transparent stuff like Squid, siproxd, ftp-proxy etc.

Local PBR can be a mind/long term task :)

Thanks guys, very appreciated!

fichtner commented 5 years ago
mimugmail commented 5 years ago

If you need root access to some Multi WAN systems, ping me <3

fichtner commented 5 years ago

@mimugmail I'll let you know when to test. Still some things to figure out.

fichtner commented 5 years ago

Unfortunately 19.1 is too close now and the seemingly simple changes are too dangerous to implement in the current spaghetti code without risking regressions. We may have to rework some of it first... :(

mimugmail commented 5 years ago

Indeed, No risk, Just fun :)

niziak commented 5 years ago

Similar issue was here: #2563

AdSchellevis commented 5 years ago

@mimugmail the gateway status/edit pages are changed now, could you check on your end? system_gateways.php is ordered by priority now and marks the gateways "(default)" according to the new logic.

The routing itself is still setup with the old code, but the status page should reflect how it would react when finished.

It's quite a project to reimplement all the logic here and keep user impact as low as possible, so a pair of extra eyes while working on it would be highly appreciated. Thanks!

mimugmail commented 5 years ago

Priority selection looks good. Currently my test system don't like to monitor my first default gateway, strange. Other two gateways are monitored. What I observed is, that when I set a second gateway as default, the checkbox in the first gateway isn't deselected. Also not sure if this was the case before, but doesn't make sense.

mimugmail commented 5 years ago

Another observation. Default gateway prio1, has IP 192.168.0.1 ... when I change IP to 192.168.0.2 it get's saved. But when I change prio of this gateway to 4 ... the IP is back at 192.168.0.1.

mimugmail commented 5 years ago

I'm not sure if it's a good idea to make prio 1 the default when adding new gateway, as it would break a setup when adding a new internal one. This is one part of the problematic, since when you have two or more WAN gateways and an internal to a VoIP net or whatevery, it could be chosen as your default one (if not marked as down)

fichtner commented 5 years ago

I’m not sure if we need a prio after all. My idea was to use the gateway group ordering. It fulfils the same requirement in the end.

So you could choose one default gateway group for IPv4 and IPv6 or leave the default behaviour (which can already be steered by force down).

On 11. Apr 2019, at 06:41, Michael notifications@github.com wrote:

I'm not sure if it's a good idea to make prio 1 the default when adding new gateway, as it would break a setup when adding a new internal one. This is one part of the problematic, since when you have two or more WAN gateways and an internal to a VoIP net or whatevery, it could be chosen as your default one (if not marked as down)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

AdSchellevis commented 5 years ago

@mimugmail @fichtner some feedback:

That's on purpose, you can have multiple defaults, which are selected by priority. The form shows the active default. The previous auto deselect was quite wonky to start with. We could reimplement something similar, although I don't think we should, since part of the gateways are automatic and could be default (dhcp for example), which always leads to some weirdness in the logic underneath.

I'm not sure I can reproduce this, it doesn't make sense it knows previous state. Just to be sure, was this a manually added gateway or a dynamic one?

Higher priority would make it more attractive, you would like to set this to 0 when new/dynamic (same as localhost networks)?

I think we should have a priority, since it prevents twisted logic while selecting gateways. Most of the misery now seems to be a result of not being able to properly select which gateway is default now, this at least provides the option to make it explicit. Don't the groups already support this kind of priority using the Tiers by the way?

mimugmail commented 5 years ago

Would this work if you have more than one FO group.

fichtner commented 5 years ago

Does this mean we're not doing "default gateway group" idea anymore? I seems cleaner to me but happy to delete the remaining code to not have conflicting systems of operation.

More than one default gateway is going to introduce side effects in other code unless everything is neatly refactored. The groups avoid all this and you don't actually have multiple gateways in our system just one according to rotation so double or triple check default gateway depends on settings default gateway switching in the first place. What happens if it is turned off? Or are we going to remove this feature checkbox as well?

AdSchellevis commented 5 years ago

I'm not sure which code you mean now, which code would need to be deleted?

As for refactoring the rest of the gwlb.inc, I am going to fix that too, we can't have different decision options about what's "default". so getDefaultGW() should be the only call to determine this. I want to prevent being haunted by gateway misery indefinitely :) it's flawed by design now.

fichtner commented 5 years ago

https://github.com/opnsense/core/blob/master/src/www/system_general.php#L445-L477

then let's ditch all of this. from what I can understand none of these config.xml related settings will be needed?

AdSchellevis commented 5 years ago

except the "gw_switch_default" I presume, since we don't have a good option to fix the local default using policy based routing.

AdSchellevis commented 5 years ago

let me rework fixup_default_gateway() first, so you know what I mean.

fichtner commented 5 years ago

if we can have multiple default gateways selected in their settings we don't need gw_switch_default :)

AdSchellevis commented 5 years ago

there is no trigger to change gateways now when the gateway is down.... we can change its name, but the function isn't there by my knowledge.

mimugmail commented 5 years ago

IMHO, there are two options:

  1. Have a default gw group, like with gateway groups for failover
  2. Have priorities like Ad started, but the we could/should just rename "Default Gateway" to something like "WAN Gateway". So, it's clear that it's for outside connections, and of course, you can have multiple of them.
AdSchellevis commented 5 years ago

I like the idea of "WAN Gateway", but that's more a naming thing we can sort out when finished with the code.

I think it will be pretty difficult to have a default group, since in the end, the group itself would need a priority then (within the tier) so we can select one default active gateway. Maybe I'm missing something obvious, but to me it looks like the problem is always the same, but more related to the gateway than the group.

mimugmail commented 5 years ago

It's indeed just a naming confusion. A default gateway can only be one gateway (the default). What we do/want here is a tiering for it, so there still can be only ONE default gateway. That's why I was confused yesterday when testing the code as in status view there is only one gw marked as default, but in edit mode the checkbox for "Default Gatway" was ticked on multiple gateways :)

Back to the default prio, I'd say a default of 0 or none would be best, so when adding new gateways (automatic or manually) won't break your current configuration, especially when working remote.

AdSchellevis commented 5 years ago

default is 1 now, for both automatic and manual, changing both to 0 would do the same. right? I don't mind what it is, as long as the mechanism doesn't change drastically.

mimugmail commented 5 years ago

1 is highest priority right? So GW with prio 1 has most chance to be default? Then it doesn't make sense to me to have default 1 when adding new GW. A default that gateway is not in the decision process (like 0?) would make more sense to me.

AdSchellevis commented 5 years ago

no higher is more important, so it would prefer 2 over 1, the sorting order in the form matches preference (for easy tracking)

mimugmail commented 5 years ago

Oh, but this is reversed compared to Tier's in Failover Groups, wouldn't this confuse the end user?

AdSchellevis commented 5 years ago

I'm not sure, weights work the same in the same form... I don't have a strong preference, although we need to calculate it to keep max --> low to prevent the order changing from what it was (dynamic preferred over manual).

Order seems to be quite important, and quite obscure how it was.

mimugmail commented 5 years ago

It's only related to UX, as an enduser working with Tiering I'd expect that Prio1 would be chosen first.

AdSchellevis commented 5 years ago

default 255 and 1 highest then?

mimugmail commented 5 years ago

Perfect! :)

AdSchellevis commented 5 years ago

@mimugmail The complete gateway code is refactored, I did some testing locally and all looks more consistent now. Eventually we could consider to also migrate return_gateway_groups_array(), which would make gwlb.inc the dpinger.inc package.

As soon as you have some time for testing, let me know.

AdSchellevis commented 5 years ago

some notes, for future reference:

fichtner commented 5 years ago

I'm ok with changing the name to dpinger.inc -- gwlb.inc shouldn't be in interfaces.inc anymore and related code should be moved either to system.inc or interfaces.inc.

Shall I opportunistically change the file name right away?

fichtner commented 5 years ago

(my IPv4 is completely broken on master as it seems to stuff an extra default gateway into the mix for the HE tunnel, which is IPv6 only and requires another IPv4 gateway. Some other issues along the way as far as I can see now. I can look into it on Monday.)

AdSchellevis commented 5 years ago

it's probably something small, just let me know if I can help debugging it.

AdSchellevis commented 5 years ago

@fichtner can you try this https://github.com/opnsense/core/commit/7a8b12f030a851a801f55c27ee2a90635805020c ? I might have been a bit to enthusiastic here by adding tunnel endpoints automatically as possible gateways. The old code only included the ones which wrote a /tmp/XX_router file. Other possible suspect is https://github.com/opnsense/core/commit/bfca97e2e0d821e3e9ad38833fd99f4a302bee83, which accidentally returned a gateway without an address.

mimugmail commented 5 years ago

When manually disabling gw with prio 1 it's changing correctly, after enabling it seems to recognize the correct gw (188) but sets 127.0.0.1 as default:

Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: entering configure using defaults
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: IPv4 default gateway set to wan
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: setting IPv4 default route to 81.24.66.188
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: removing /tmp/em0_defaultgw
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: creating /tmp/em0_defaultgw using '81.24.66.188'
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: IPv6 default gateway set to loopback
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: setting IPv6 default route to ::1
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: keeping current default gateway '::1'
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: removing /tmp/em0_defaultgw
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: creating /tmp/lo0_defaultgw using '127.0.0.1'
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '::1'