Closed pete1019 closed 4 years ago
Tested this some more, and after setting the gateway switching to On it appears to behave itself. Need the op's and others to confirm this.
Tested this some more, and after setting the gateway switching to On it appears to behave itself. Need the op's and others to confirm this.
You mean this?
When using Unbound for DNS resolution you should also enable Default Gateway Switching via System->Settings->General, as local generated traffic will only use the current default gateway which will not change without this option.
From here: https://docs.opnsense.org/manual/how-tos/multiwan.html
Nope, this was always active in my tests on 20.1.7 and i can still reproduce the problem.
Please always check which gateway it really goes by checking something like www.ipcheck.com for example
How long are you waiting for recovery, on mine it takes around 60 seconds.
you mean 60 seconds when packetloss is back under 10%?
With STATIC IP on WAN it is back instantly (after packetloss being under the threshold). I was not checking so long (60 seconds) on my tests.
Interesting.. OK, a bit of deeper delving has maybe got me somewhere... It would appear that the call to configctl filter reload in 10-dpinger doesn't actually do anything, changing the line to /usr/local/sbin/configctl filter reload does.
@pete1019 -Try editing the file, you'll find it in /usr/local/etc/rc.syshook.d
Ok, I finally found the time to debug.
My test machine has 20.7b (ISO, not only UI), WAN1 is DHCP, gateway has prio 251, marked as upstream, monitoring enabled. WAN2 is static, 192.168.12.X, prio 255, marked as upstream, monitoring enabled. In System : Settings : General, default gateway switching is enabled. I do NOT use gateway groups or similar, just gateway switching. I shut the switchport where WAN1 sits (like unplugging the cable or a defect of modem) and it fails over to static. I reenable the port and it fails back to DHCP gateway. I did this 3 times .. always set the correct gateway.
Cant reproduce ..
Ok, I finally found the time to debug.
My test machine has 20.7b (ISO, not only UI), WAN1 is DHCP, gateway has prio 251, marked as upstream, monitoring enabled. WAN2 is static, 192.168.12.X, prio 255, marked as upstream, monitoring enabled. In System : Settings : General, default gateway switching is enabled. I do NOT use gateway groups or similar, just gateway switching. I shut the switchport where WAN1 sits (like unplugging the cable or a defect of modem) and it fails over to static. I reenable the port and it fails back to DHCP gateway. I did this 3 times .. always set the correct gateway.
Cant reproduce ..
Please do exactly as i stated here: https://github.com/opnsense/core/issues/4160#issuecomment-641789848
Looks like you always physically unplug and replug. Please only do this ones with the DHCP-Port. 2nd time please use a dumb switch and cut the connection there so you don't unplug the cable to WAN of opnsense. It is important to not physically detatch the cable again.
Also: i use Gateway-Group. Just like everything was explained in official Multi-WAN tutorial: https://docs.opnsense.org/manual/how-tos/multiwan.html
The reason why this is so important to work: imagine your Modem would reboot for some reason. It will get a link down in opnsense on your WAN. Later, only the Internet will fail (no link down and link up again) because your provider is down. It will switch to WAN2 but it will never (or not as soon as intended) switch back to WAN1.
I cant test this from home .. maybe next week when I get back to work ...
I cant test this from home .. maybe next week when I get back to work ...
No dumb little switch at home? But again: THANKS everyone for your time!
The machine is at work and needs cabling. But I'm happy gateway selection code is fine. I never use gateway groups, but we will see next week
The machine is at work and needs cabling. But I'm happy gateway selection code is fine. I never use gateway groups, but we will see next week
If i use physically unplug and replug on WAN with DHCP everything works fine for me as well. Thats why it is important to do it like this: https://github.com/opnsense/core/issues/4160#issuecomment-641789848
Excited on how your tests go next week.
Yes it DOES work fine if you physically unplug, that's because a WAN down/up event is triggered. If however there is an upstream failure and dpinger should do the detection THAT is where the issue is. As I said, the reason it fails is due to what I pointed out in an earlier message, the problem is in 10-dpinger, it doesn;t run 'configctl filter reload', nor does it write anything to the log to say it hasn't. If you give the full path to configctl then it does work.
Yes it DOES work fine if you physically unplug, that's because a WAN down/up event is triggered. If however there is an upstream failure and dpinger should do the detection THAT is where the issue is. As I said, the reason it fails is due to what I pointed out in an earlier message, the problem is in 10-dpinger, it doesn;t run 'configctl filter reload', nor does it write anything to the log to say it hasn't. If you give the full path to configctl then it does work.
Thanks, so who is able to fix that and release it? I think pfsense does not have this issue as someone stated that here before.
I think i should set up another test-vm here. But i need to think about how to get a second WAN since i don't have the LTE-device here anymore. Can you please give more instructions what i should exactly do to test your fix? Log into opnsense via ssh... nano into " /usr/local/etc/rc.syshook.d", change what (line)? Will this survive an update? Thanks
You are, until an update is released. You can fix it yourself, I've posted how. I don't really see the relevance of pfsense in the conversation,
Now fixed and will be in the next release or you can patch it yourself.
So is this commit fixing the issue? Anyone can confim? Thank you.
Was fixed in 20.1.8 most likely. :)
Important notices Before you add a new report, we ask you kindly to acknowledge the following:
[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[X] I have searched the existing issues and I'm convinced that mine is new.
Describe the bug Multi-WAN Failover fails on unplugging cable (very short period) of WAN1 / WAN2. It will do strange things and not go back to Tier1. Hitting save on any interface will fix the issue till next time.
To Reproduce
Expected behavior Failover should work as set up (Tier 1, Tier 2) even on ports flapping or some hardware rebooting which is connected to WAN1 or WAN2.
OPNsense 20.1.7 APU4 PC-Engines
I am willing to show the problems via anydesk or teamviewer.
THANK YOU!