openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.9k stars 3.41k forks source link

mwan3: No interface recovers from offline if all become offline #3885

Closed joaochainho closed 7 years ago

joaochainho commented 7 years ago

Hi, I noticed that no interface recovers from offline if all interfaces become offline. The test scenario if the following: two interfaces (wan and wwan), default policy is wan as primary and wwan as backup. If I manually run 'ifup wan/wwan' then both interfaces become online. I noticed this issue for some time. Tested latest mwan3 version (2.0-3) in OpenWrt and LEDE (ar71xx). Meanwhile I found out that in this state the router sends ARP requests querying the public IP addresses defined as track_ip's.

root@Router1:~# tcpdump -qni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:53:49.551576 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:50.548220 ARP, Request who-has 0.0.0.0 tell 0.0.0.0, length 28
11:53:51.548217 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:56.579945 ARP, Request who-has 0.0.0.0 tell 0.0.0.0, length 28
11:53:57.578211 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:58.578210 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:54:03.616117 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:54:04.608211 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28

My config:

config interface 'wan'
    option enabled '1'
    list track_ip '8.8.4.4'
    list track_ip '208.67.222.222'
    option reliability '2'
    option count '1'
    option timeout '2'
    option interval '5'
    option down '2'
    option up '5'

config interface 'wwan'
    option enabled '1'
    list track_ip '8.8.8.8'
    list track_ip '208.67.220.220'
    option reliability '1'
    option count '1'
    option timeout '3'
    option interval '5'
    option down '3'
    option up '8'

config member 'wan_m1_w3'
    option interface 'wan'
    option metric '1'
    option weight '3'

config member 'wan_m2_w3'
    option interface 'wan'
    option metric '2'
    option weight '3'

config member 'wwan_m1_w2'
    option interface 'wwan'
    option metric '1'
    option weight '2'

config member 'wwan_m2_w2'
    option interface 'wwan'
    option metric '2'
    option weight '2'

config policy 'wan_only'
    list use_member 'wan_m1_w3'

config policy 'wwan_only'
    list use_member 'wwan_m1_w2'

config policy 'balanced'
    list use_member 'wan_m1_w3'
    list use_member 'wwan_m1_w2'

config policy 'wan_wwan'
    list use_member 'wan_m1_w3'
    list use_member 'wwan_m2_w2'

config policy 'wwan_wan'
    list use_member 'wan_m2_w3'
    list use_member 'wwan_m1_w2'

config rule 'default_rule'
    option dest_ip '0.0.0.0/0'
    option use_policy 'wan_wwan'

I'm available to provide more info and do further testing if needed.

TIA

joaochainho commented 7 years ago

Version 1.6-3 doesn't have this issue.

feckert commented 7 years ago

You have to change the last resort to default

joaochainho commented 7 years ago

Hi @feckert , thanks for your feedback. Do you mean to use the default routing table as policy?

config rule 'default_rule'
    option dest_ip '0.0.0.0/0'
    option use_policy 'default'
joaochainho commented 7 years ago

Hi @feckert , I tested with default routing table as policy. Indeed the primary (wan) interface recovers automatically, but traffic isn't routed through the secondary (wwan) interface. Only if wan is really down (cable unplugged and no default route on that interface). Am I missing something? TIA

feckert commented 7 years ago

@joaochainho, I have the same scenario on my router (wan as main and wwan as backup). If mwan3track notice that interface wan is down traffic will be router to the wwan interface. And if wwan goes also down then (if use_policy default is set) mwan3track will recover the interface because he will use the default routing table. Have you set different metrics for each interface (wan / wwan) in the network config?

joaochainho commented 7 years ago

Have you set different metrics for each interface (wan / wwan) in the network config?

Yes, 10 for wan and 20 for wwan. I installed and configured everything from scratch (LEDE r3844-c5e245a), and still not working for me. Here's what I get from ping when both interfaces are down and wan supposedly should be online (wan_wwan as default policy).

root@LEDE:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable

I'm trying to figure out what's different between MWAN3 v2.x and 1.6, but no clue yet.

feckert commented 7 years ago

@joaochainho i am not aware of the version LEDE r3844-c5e245a. Is it the latest LEDE stable 17.01 or the master. If on master I fixed a issue #4158 maybe this could be your problem.

What interface proto do you have on the wan interface dhcp/static? And could you add your output of ip route show to get the default route if one is set?

I think if both interfaces are down then no default route is set the router can not route the icmp pages (wan dhcp cable unpluged/ wwan dhcp lease removed and not renewed due connectivity loose). But if one of the wans get online again (wan cable pluged in / wwan dhcp lease renewed) then the icmp can be routed again (default route set be netifd on ifup event) and mwan3track declares this wan after successful pings as online again.

joaochainho commented 7 years ago

Hi @feckert, thanks for your feedback and sorry for replying so late. Regarding your questions,

Is it the latest LEDE stable 17.01 or the master. If on master I fixed a issue #4158 maybe this could be your problem.

It's master commit c5e245a, and the fix for #4158 is already included.

What interface proto do you have on the wan interface dhcp/static?

wan (eth0) uses DHCP. wwan (USB modem) uses 3G protocol.

Could you add your output of ip route show to get the default route if one is set?

wan online, wwan online

~# mwan3 interfaces
Interface status:
 interface wan is online and tracking is active
 interface wwan is online and tracking is active

~# ip route show
default via 192.168.100.254 dev eth0  proto static  src 192.168.100.205  metric 10 
default via 10.64.64.64 dev 3g-wwan  proto static  metric 20 
10.64.64.64 dev 3g-wwan  proto kernel  scope link  src 10.16.194.169 
192.168.90.0/24 dev br-lan  proto kernel  scope link  src 192.168.90.254 linkdown 
192.168.100.0/24 dev eth0  proto static  scope link  metric 10 
192.168.100.254 dev eth0  proto static  scope link  src 192.168.100.205  metric 10 

wan offline, wwan offline

~# mwan3 interfaces
Interface status:
 interface wan is offline and tracking is active
 interface wwan is unknown and tracking is active

~# ip route show
default via 192.168.100.254 dev eth0  proto static  src 192.168.100.205  metric 10 
192.168.90.0/24 dev br-lan  proto kernel  scope link  src 192.168.90.254 
192.168.100.0/24 dev eth0  proto static  scope link  metric 10 
192.168.100.254 dev eth0  proto static  scope link  src 192.168.100.205  metric 10 

My suspicion is that when all interfaces are offline, the specific MWAN rules/routes (based on the configured metrics/weights) are deleted. Because the ethernet cable on the wan port is never unplugged, the physical link state never changes and there are no ifdown/ifup events. So the MWAN rules are never reloaded again. Does this make sense? Interestingly this doesn't happen with MWAN 1.5x and 1.6x versions.

feckert commented 7 years ago

My suspicion is that when all interfaces are offline, the specific MWAN rules/routes (based on the configured metrics/weights) are deleted. Because the ethernet cable on the wan port is never unplugged, the physical link state never changes and there are no ifdown/ifup events. So the MWAN rules are never reloaded again. Does this make sense?

@joaochainho Yes the rules/routes are deleted but the mwan3track is still running on inteface wwan/wan

I have attached my mwa3 config

config policy 'wan_only' list use_member 'wan_m1_w1'

config policy 'xdsl_only' list use_member 'xdsl_m2_w1'

config policy 'wwan_only' list use_member 'wwan_m3_w1'

config member 'wan_m1_w1' option interface 'wan' option metric '1' option weight '1'

config member 'xdsl_m2_w1' option interface 'xdsl' option metric '2' option weight '1'

config rule 'default_rule' option dest_ip '0.0.0.0/0' option proto 'all' option sticky '0' option use_policy 'wan_xdsl_wwan'

config member 'wwan_m3_w1' option interface 'wwan' option metric '3' option weight '1'

config policy 'wan_xdsl_wwan' list use_member 'wan_m1_w1' list use_member 'xdsl_m2_w1' list use_member 'wwan_m3_w1' option last_resort 'default'

config interface 'wan' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option count '1' option timeout '2' option interval '60' option failure '10' option recovery '10' option down '3' option reliability '1' option up '3' option family 'ipv4' option flush_conntrack 'always'

config interface 'xdsl' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option reliability '1' option count '1' option timeout '2' option interval '60' option failure '10' option recovery '10' option down '3' option family 'ipv4' option up '3' option flush_conntrack 'always'

config interface 'wwan' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option reliability '1' option count '1' option timeout '5' option interval '60' option failure '10' option recovery '10' option down '3' option up '3' option family 'ipv4' option flush_conntrack 'always'

I have a simpe backup szenario

  1. wan -> all traffic goes over wan if online -> if it goes offline then surfe over xdsl
  2. xdsl -> all traffic goes over xdsl if online -> if it goes offline then surfe over wwan
  3. wwan -> all traffic goes over wwan if online this is the las wan interface -> if this last inerface goes offline as well then i have a problem ;-)

If during backup a higher interface comes online again then this interface will used wwan->wan

tpham3783 commented 7 years ago

I had a very simulator setup and problem a few months ago. The way that I fixed it was applying the below patch to allow all outbound icmp traffic for mwan3. From my observation when mwan3 did not recover, mwan3 was blocking all ICMP ping traffic. Ping -i 8.8.8.8 always returned an error. I still do not know to this day why it happened, why did mwan3 blocked ICMP ping traffic on all interfaces?

The setup of the system encountered the problem on was:

Ethernet WAN - dhcp protocol WWAN - qmi or SierraWireless SDK w/ dhcp

Thanks,

TP

Reference Patch:

--- files/lib/mwan3/mwan3.sh.vanilla +++ files/lib/mwan3/mwan3.sh @@ -490,6 +490,9 @@ mwan3_create_policies_iptables()

     $IPT -F mwan3_policy_$1

On Mon, Apr 3, 2017 at 8:59 AM, Florian Eckert notifications@github.com wrote:

My suspicion is that when all interfaces are offline, the specific MWAN rules/routes (based on the configured metrics/weights) are deleted. Because the ethernet cable on the wan port is never unplugged, the physical link state never changes and there are no ifdown/ifup events. So the MWAN rules are never reloaded again. Does this make sense?

@joaochainho https://github.com/joaochainho Yes the rules/routes are deleted but the mwan3track is still running on inteface wwan/wan

-

If the cable is plugged in again (tested on my setup) mwan3track is recognizing the interface after reliability check as up again and i am able to surf over the wan interface.

If i enable wwan (plugin usb wwan modem) then mwan3track recognize the interface wwan after reliability check as online again as well.

I have attached my mwa3 config

config policy 'wan_only' list use_member 'wan_m1_w1'

config policy 'xdsl_only' list use_member 'xdsl_m2_w1'

config policy 'wwan_only' list use_member 'wwan_m3_w1'

config member 'wan_m1_w1' option interface 'wan' option metric '1' option weight '1'

config member 'xdsl_m2_w1' option interface 'xdsl' option metric '2' option weight '1'

config rule 'default_rule' option dest_ip '0.0.0.0/0' option proto 'all' option sticky '0' option use_policy 'wan_xdsl_wwan'

config member 'wwan_m3_w1' option interface 'wwan' option metric '3' option weight '1'

config policy 'wan_xdsl_wwan' list use_member 'wan_m1_w1' list use_member 'xdsl_m2_w1' list use_member 'wwan_m3_w1' option last_resort 'default'

config interface 'wan' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option count '1' option timeout '2' option interval '60' option failure '10' option recovery '10' option down '3' option reliability '1' option up '3' option family 'ipv4' option flush_conntrack 'always'

config interface 'xdsl' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option reliability '1' option count '1' option timeout '2' option interval '60' option failure '10' option recovery '10' option down '3' option family 'ipv4' option up '3' option flush_conntrack 'always'

config interface 'wwan' option enabled '1' list track_ip '8.8.8.8' list track_ip '8.8.4.4' option reliability '1' option count '1' option timeout '5' option interval '60' option failure '10' option recovery '10' option down '3' option up '3' option family 'ipv4' option flush_conntrack 'always'

I have a simpe backup szenario

  1. wan -> all traffic goes over wan if online -> if it goes offline then surfe over xdsl
  2. xdsl -> all traffic goes over xdsl if online -> if it goes offline then surfe over wwan
  3. wwan -> all traffic goes over wwan if online this is the las wan interface -> if this last inerface goes offline as well then i have a problem ;-)

If during backup a higher interface comes online again then this interface will used wwan->wan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openwrt/packages/issues/3885#issuecomment-291135226, or mute the thread https://github.com/notifications/unsubscribe-auth/AH0qom7iqebgBjbmOiZKEIFWRTEGmCq6ks5rsO0xgaJpZM4LooBX .

joaochainho commented 7 years ago

Hi @tpham3783 , your patch solved my issue! :smile:

joaochainho commented 7 years ago

hi @feckert , I only now noticed this new option last_resort. I'll try last_resort = default then.

joaochainho commented 7 years ago

Hi @feckert , using last_resort = default also solves the problem 👍 However, during the tests I stumbled on another issue - mwan metrics doesn't seem to apply to the traffic originated from the router itself. Default metrics from the interfaces seem to apply instead. Is this behaviour expected?

feckert commented 7 years ago

@joaochainho if last_resort is not set to default and no interface is up, then the default table will not run through and the package will be dropped. A improvement would be to add the ping targets to an ipset and do not mangle the packages. As suggested by @tpham3783. The ipset should only contain ip/adresse of the targets per interface.

mwan metrics doesn't seem to apply to the traffic originated from the router itself.

See: https://wiki.openwrt.org/doc/howto/mwan3 Section: The routable loopback (self)

joaochainho commented 7 years ago

HI @feckert , thanks for your feedback. And sorry for missing the wiki info :smile:

tpham3783 commented 7 years ago

A improvement would be to add the ping targets to an ipset and do not mangle the packets.

That's a good idea! Routing policy of wan3 should be updated to allow network ping to the target IP addresses.

@Joae, thank you for your persistence in testing the last_resort config option, because I too, did not know about it!

thanks,

TP

On Wed, Apr 5, 2017 at 8:40 AM, Florian Eckert notifications@github.com wrote:

@joaochainho https://github.com/joaochainho if last_resort is not set to default and no interface is up, then the default table will not run through and the package will be dropped. A improvement would be to add the ping targets to an ipset and do not mangle the packages. As suggested by @tpham3783 https://github.com/tpham3783. The ipset should only contain ip/adresse of the targets per interface.

mwan metrics doesn't seem to apply to the traffic originated from the router itself.

See: https://wiki.openwrt.org/doc/howto/mwan3 Section: The routable loopback (self)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openwrt/packages/issues/3885#issuecomment-291848987, or mute the thread https://github.com/notifications/unsubscribe-auth/AH0qok0aeYPbyqhHE3PvkWmB-A5bxz0nks5rs4vYgaJpZM4LooBX .

feckert commented 7 years ago

@joaochainho I think we could close this issue. I will try to implement a feature that the track_ips will not be mangled on the OUTPUT CHAIN.

joaochainho commented 7 years ago

I think we could close this issue. I will try to implement a feature that the track_ips will not be mangled on the OUTPUT CHAIN.

Hi @feckert I agree, it can be closed. Thanks for your help and effort.