openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
4.01k stars 3.49k forks source link

mwan3: ipv6 failover does not work #12309

Closed aaronjg closed 4 years ago

aaronjg commented 4 years ago

If you have a setup with two ipv6 devices set up as a failover, it appears that mwan3 is unable to track the inactive device with ping.

As far as I can tell, the ICMP packet goes through iptables and gets marked by mwan3, and then it tries to route it out of whatever the currently active interface is. So if you have a failover interface, the kernel tries to route the packet of the primary interface, and then fails.

As a workaround, you can add rules for each source ipv6 address to tell it to route ipv6-icmp packets through the default table. However, this is not an ideal solution if your ipv6 address comes from dhcpv6.

It looks like this may have arisen in https://github.com/openwrt/packages/pull/6515.

Someone had noted the issue but the proposed solution does not work with multiple ipv6 interfaces. https://github.com/openwrt/packages/pull/6515#issuecomment-410832286

cc @feckert

aaronjg commented 4 years ago

I think this is actually a bug in busybox ping:

https://git.busybox.net/busybox/tree/networking/ping.c#n874

However, perhaps a workaround should be put in place here, or ping should be disabled as an option for testing if ipv6 hosts are up.

jamesmacwhite commented 4 years ago

I have experienced similar issues with IPv6 ping when using busybox vs iputils-ping6. There are also various cases where OpenWrt itself doesn't select the correct source address I believe.

https://forum.openwrt.org/t/ipv6-source-address-selection-broken-for-packets-generated-on-the-router/37303

https://bugs.openwrt.org/index.php?do=details&task_id=2167

https://forum.openwrt.org/t/ping-and-traceroute-failing-for-eth0-3-on-ipv6/44680

Just for info.

aaronjg commented 4 years ago

@jamesmacwhite Thanks for the links.

Are you saying this works with iputils ping6? What version are you using? The version available in the latest releases appears to be broken even for binding to an interface on ipv4.

jamesmacwhite commented 4 years ago

No sorry, I was just clarifying the behaviour you describe is consistent with similar problems I found when debugging mwan3 and IPv6 which leads to issues around mwan3track, given that uses ping as it's test for interface connectivity being "alive". I don't have it installed for that reason.

I have 2 WANs both with IPv6 (although one WAN is 6in4, rather than native), however I use NAT66, because my second WAN doesn't provide a prefix, only a single /64. Relaying isn't an option for me.

I think in most cases for IPv6 and mwan3 you need to look at NPTv6 or NAT66 for reliable operation.

aaronjg commented 4 years ago

@jamesmacwhite

Thanks for clarifying. Why do you say that NPTv6 or NAT66 is needed for reliable operation?

The forum link is very helpful. It looks like this is exactly the issue. With 'ping -4', specifying the interface allows the routing rules to be bypassed, however for ipv6, they still go through the routing table and then when they hit from all fwmark 0x3e00/0x3f00 unreachable they get discarded.

jamesmacwhite commented 4 years ago

Glad it helped you, there does seem to weird behaviour with source address selection in OpenWrt in various scenarios. This is one of them, although there are workarounds.

That seems to be the consensus generally with multiple IPv6 networks and being able to steer IPv6 traffic. NPTv6 or NAT66 is often suggested. Some people will vomit at NAT66, I have no choice given my second WAN doesn't delegate a prefix and I can't relay it without breaking my other IPv6 prefix, NPTv6 is an option when you have large enough prefixes for both WANs and avoids having to have NAT involved. That's what I've found when doing IPv6 multihoming anyway.

luizluca commented 4 years ago

That seems to be the consensus generally with multiple IPv6 networks and being able to steer IPv6 traffic. NPTv6 or NAT66 is often suggested. Some people will vomit at NAT66, I have no choice given my second WAN doesn't delegate a prefix and I can't relay it without breaking my other IPv6 prefix, NPTv6 is an option when you have large enough prefixes for both WANs and avoids having to have NAT involved. That's what I've found when doing IPv6 multihoming anyway.

I'm working with mwan3 and ipv6 for some time. I normally use NETMAP instead of NPT because NPT breaks conntrack (and the statefull firewall). I use NETMAP to map all my available delegated prefixes but one to subnets which have services accessed from internet. The remaining subnets are all mapped to that one single /64 network. I keep device id in both cases and I never masq to a single address.

The problem I faced is that I'm using ULA prefixes for my internal network and no real internet addresses are used directly by internal machines. However, according to default /etc/gai.conf, IPv4 will have precedence over IPv6 ULA while accessing internet services both published in IPv4 and IPv6. IPv6 only clients or IPv6 only services work as expected. In the future, I'll try to play with dhcpv6 options to change that behavior for all my network.

dl12345 commented 4 years ago

https://github.com/openwrt/packages/pull/12128

aaronjg commented 4 years ago

12128

@dl12345 Please provide more context. These are not the same issue.

dl12345 commented 4 years ago

12128

@dl12345 Please provide more context. These are not the same issue.

It appeared to be, sorry if it's not. You're using IPV6 (so non-NAT). You can't track the inactive device with a ping since the routing goes out the active interface. From your original post

it tries to route it out of whatever the currently active interface is. So if you have a failover interface, the kernel tries to route the packet of the primary interface, and then fails.

and from the linked PR, the problem is very similar. The default route is not the interface being checked, so the packet gets routed out the wrong interface

normal routing "ping -I 8.8.8.8" does not work, when the default route is currently not going over this WAN link

aaronjg commented 4 years ago

With just the link to the PR, it wasn't clear which of the three issues you were referencing as a fix. Thanks for the context.

I agree that the issues seem similar - however, I don't believe they are the same. IPv6 doesn't use arp. It looks like in the case of IPV6 it is related to the extra route rules that mwan3 adds and openwrt policy routing for ipv6.

ngtech commented 4 years ago

In my case (with multiple ipv6 global addresses on tracked interfaces) a working fix was to select only the first ipv6 global address returned from ip in line 139 of /usr/sbin/mwan3track script using "head -1". From: ADDR=$(ip -6 addr ls dev "$DEVICE" | sed -ne 's/ *inet6 \([^ \/]*\).* scope global.*/\1/p') To: ADDR=$(ip -6 addr ls dev "$DEVICE" | sed -ne 's/ *inet6 \([^ \/]*\).* scope global.*/\1/p'| head -1)

aaronjg commented 4 years ago

fixed with PR #12229