openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.95k stars 3.46k forks source link

mwan3: VPN tunnels over PPTP/L2TP/PPPoE interfaces are broken #13195

Closed jamesmacwhite closed 3 years ago

jamesmacwhite commented 4 years ago

Environment: OpenWrt 19.07.3 Linksys WRT3200ACM (mwan3 2.9.0) FAO: @aaronjg

It has been found this isn't directly a mwan3 problem, but mwan3 triggers it due to the use of fwmark. Further details and explanation here: https://github.com/openwrt/packages/issues/13195#issuecomment-680295448

Original issue description:

@aaronjg Continuing the discussion from the comment here: https://github.com/openwrt/packages/pull/13169#issuecomment-678830151.

After @openwrt-diy reported the issues on L2TP, I decided to test out L2TP and mwan3 myself. An ISP in the UK offers a L2TP transit service here: https://www.aa.net.uk/broadband/l2tp-service/. It has full dual stack. A fully routed static IPv4 and delegated IPv6 prefix. They also publish specific information on configuring with OpenWrt: https://support.aa.net.uk/L2TP_Client:_OpenWRT

I ended up testing it out. Some of the issues encountered with mwan3 enabled are known and likely related to https://github.com/openwrt/packages/issues/10712 and https://github.com/openwrt/packages/issues/13145, which have a pending PR, but one area I did encounter was mwan3 appears to be interfering with being able to configure IPv6 on the L2TP interface.

Andrews & Arnold use PPP and DHCPv6 to establish the L2TP tunnel for IPv4 and IPv6. It's worth noting this doesn't use IPSec, it's L2TPv2 over UDP.

The setup is two interfaces aaisp and aaisp6. Like the example information, the IPv4 and IPv6 interfaces are split, but the underlying network interface l2tp-aaisp is dual stack. You can also make aaisp6 an alias interface (which I have done). With mwan3 enabled the DHCPv6 client doesn't pick up the IPv6 configuration, if I disable mwan3 and restart the aaisp6 interface IPv6 will then be configured.

With mwan3 enabled it seems it can only establish the IPv4 interface not IPv6. It looks like DHCPv6 related communication is either not going over the right interface or the packets are being blocked as the IPv6 side is not established.

As noted, there are rules preventing icmp6-type 133-137 from being interfered with, but I don't know if there's anything else around L2TP or DHCPv6 that might need to be looked at in this regard, as currently with mwan3 enabled the tunnel cannot even be fully established.

aaronjg commented 4 years ago

Thanks. I wonder what part of the process is breaking down with mwan3 enabled.

My guess is that outgoing dhcpv6 requests are getting a firewall mark and routed out of the wrong ipv6 interface. Could you add an mwan3 rule so that IPv6 UDP traffic with a source port of 546 and destination port of 547 gets handled by the default routing table? Hopefully that will fix the issue, and if so, we can add it to the default config file or update the script to always add this.

jamesmacwhite commented 4 years ago

Tried this rule:

config rule 'dhcpv6_default'
    option src_port '546'
    option dest_port '547'
    option proto 'udp'
    option family 'ipv6'
    option use_policy 'default'

It doesn't seem to help, but as soon as mwan3 is disabled then the DHCPv6 client will configure the IPv6 interface so I'm reasonably confident something mwan3 is doing is causing it. I looked at the iptables output the rule was never hit when restarting the interface, but it does have matches against the rule after a few minutes of being enabled.

It is odd though as I have another WAN that also uses DHCPv6 client to configure itself and that has no such problems configuring the interface with mwan3 enabled, it seems specific to L2TP.

aaronjg commented 4 years ago

Interesting. If mwan3 is interfering either the iptables rules or the routing rules must be doing something to the outgoing or return packets. Usually it is the outgoing packets that are more problematic.

It's possible something is going on with the return packets, so you could also add a rule for ports 547->546.

If that doesn't work, could you use tcpdump or firewall logging rules to figure out how the packets are being mangled by mwan3?

jamesmacwhite commented 4 years ago

No dice with the return rule. I did a tcdump for DHCPv6 related packets.

With mwan3 enabled:

root@linksys-wrt3200acm:~# tcpdump -i l2tp-aaisp -n -vv '(udp port 546 or 547) or icmp6'
tcpdump: listening on l2tp-aaisp, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
13:54:55.162926 IP6 (flowlabel 0x5569c, hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::98e6:bb8c:ea4f:6b1b > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:54:56.150997 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:54:57.236503 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 108) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:54:59.163729 IP6 (flowlabel 0x5569c, hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::98e6:bb8c:ea4f:6b1b > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:54:59.206498 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 305) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:55:02.996504 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 684) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:55:03.164114 IP6 (flowlabel 0x5569c, hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::98e6:bb8c:ea4f:6b1b > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:55:07.164499 IP6 (flowlabel 0x5569c, hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::98e6:bb8c:ea4f:6b1b > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:55:10.836515 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 1468) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:55:26.436512 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=6e7834 (elapsed-time 3028) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

With mwan3 disabled:

root@linksys-wrt3200acm:~# tcpdump -i l2tp-aaisp -n -vv '(udp port 546 or 547) or icmp6'
tcpdump: listening on l2tp-aaisp, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
13:56:55.160912 IP6 (flowlabel 0x5569c, hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::98e6:bb8c:ea4f:6b1b > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:56:55.180438 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::203:97ff:fe05:4000 > fe80::98e6:bb8c:ea4f:6b1b: [icmp6 sum ok] ICMP6, router advertisement, length 16
        hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 65535s, reachable time 0ms, retrans timer 0ms
13:56:55.858500 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 150) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=1a85ad (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 322303df2c80) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/48 pltime:0 vltime:0)))
13:56:55.878909 IP6 (flowlabel 0x99e21, hlim 255, next-header UDP (17) payload length: 165) fe80::203:97ff:fe05:4000.547 > fe80::98e6:bb8c:ea4f:6b1b.546: [udp sum ok] dhcp6 advertise (xid=1a85ad (client-ID hwaddr type 1 322303df2c80) (server-ID hwaddr type 1 000397054000) (IA_NA IAID:1 T1:3600 T2:5760 (IA_ADDR 2001:8b0:1111:1111:0:ffff:51bb:edf2 pltime:3600 vltime:7200)) (DNS-server 2001:8b0::2020 2001:8b0::2021) (IA_PD IAID:1 T1:3600 T2:5760 (IA_PD-prefix 2001:8b0:a629::/48 pltime:7200 vltime:7200)))
13:56:57.295565 IP6 (flowlabel 0x99e21, hlim 1, next-header UDP (17) payload length: 190) fe80::98e6:bb8c:ea4f:6b1b.546 > ff02::1:2.547: [udp sum ok] dhcp6 request (xid=762f42 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 322303df2c80) (server-ID hwaddr type 1 000397054000) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0 (IA_ADDR 2001:8b0:1111:1111:0:ffff:51bb:edf2 pltime:3600 vltime:7200)) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2001:8b0:a629::/48 pltime:7200 vltime:7200)))
13:56:57.316770 IP6 (flowlabel 0x99e21, hlim 255, next-header UDP (17) payload length: 165) fe80::203:97ff:fe05:4000.547 > fe80::98e6:bb8c:ea4f:6b1b.546: [udp sum ok] dhcp6 reply (xid=762f42 (client-ID hwaddr type 1 322303df2c80) (server-ID hwaddr type 1 000397054000) (IA_NA IAID:1 T1:3600 T2:5760 (IA_ADDR 2001:8b0:1111:1111:0:ffff:51bb:edf2 pltime:3600 vltime:7200)) (DNS-server 2001:8b0::2020 2001:8b0::2021) (IA_PD IAID:1 T1:3600 T2:5760 (IA_PD-prefix 2001:8b0:a629::/48 pltime:7200 vltime:7200)))
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel

I've doxed my /48, but I can change it, so not really concerned.

I'm not an expert here, but it looks like the router advertisement doesn't happen with mwan3 enabled, but does when disabled. Although I believe RA is icmpv6-type 134 which I thought was handled already.

aaronjg commented 4 years ago

This is strange. It is not just the RA that are not received, the DHCPv6 responses are not coming back either. Usually when this happens it is because something is wrong with the tunnel, and the packets are not actually going out.

Are you able to use the IPv4 stack? The actual LT2P connection is on IPv4, and the IPv6 traffic is tunneled through that, right?

Could you also try adding the following rule (if you haven't already) to the top of your mwan3 rules.

config rule 'aa_lt2p'
        option dest_ip '90.155.53.19'
        option family 'ipv4'
        option proto 'udp'
        option sticky '0'
        option use_policy 'default'
jamesmacwhite commented 4 years ago

Yeah you're right I think. It looks like something funky is happening with the IPv4 side on the L2TP tunnel when it's connected with mwan3 enabled. I didn't have that rule, but I've added it now to test. I did already have a static route in place for 90.155.53.19 to go through my main wan as required.

It looks like it's similar behaviour to what @openwrt-diy mentioned interface is marked as down pretty much as soon as it comes up with mwan3. Will have to test your PR #13169, although it requires compiling now with the helper library does it?

aaronjg commented 4 years ago

Yes, to use the helper library, you will need to install the SDK and compile it. You could try using the previous commit: 328e7f696 - for IPv4 and ping based tracking, this should work about the same.

I wonder if it is actually down (indicating a problem with the mwan3 rules blocknig the interface) or if mwan3track just thinks it is down (potential problem in mwan3track).

jamesmacwhite commented 4 years ago

@aaronjg Do you think it is worth bumping the Makefile to something like 2.9.1-beta during dev? I know there was a question around what version to give it. I guess now because libcap-bin has been dropped as a requirement it is less of an issue for 19.07, could just be a version increment to 2.9.1?

jamesmacwhite commented 4 years ago

@aaronjg I ended up compiling your mwan3-owner-procd branch with the SDK. Sadly it's still the same issue. As soon as mwan3 is enabled the L2TP interface from an IPv4 perspective is down, IPv6 can't provision itself through DHCPv6.

root@linksys-wrt3200acm:/tmp# ping -I l2tp-aaisp 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
aaronjg commented 4 years ago

Thanks for testing. I did do a version bump as you suggested. You could try pinging with the wrapper library DEVICE=l2tp-aaisp SRCIP=<ipv4 of l2tp-aaisp> FWMARK=0x3f00 LD_PRELOAD=/lib/mwan3/libwrap_mwan3_sockopt.so.1.0 ping 8.8.8.8

Though, if you are on the 4.x kernel still, I don't think this will change much.

If you have the route to '90.155.53.19' out of your bulk WAN already and have the mwan3 rule, all the tunnel traffic should be flagged with 0x3f00 and should go through the default routing rule, just as if mwan3 was turned off.

Any errors showing up in the system log?

Maybe you also need to add your wan gateway to the static route?

config route
        option target '90.155.53.19'
        option gateway '<wan gateway ip>'
        option interface 'wan'

Not great if your WAN is on DHCP, but could be helpful for testing.

jamesmacwhite commented 4 years ago

Thanks. I edited the Makefile before compiling an ipk to install, but thanks for changing at the branch source

root@linksys-wrt3200acm:/tmp# DEVICE=l2tp-aaisp SRCIP=81.xxxx.xxxx.xxx FWMARK=0x3f00 LD
_PRELOAD=/lib/mwan3/libwrap_mwan3_sockopt.so.1.0 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
40 packets transmitted, 0 packets received, 100% packet loss

No surprises, but this isn't 5.4 snapshot, so to be expected I guess.

I have both the static route with the gateway defined always, as otherwise the static routes don't work, along with the mwan3 rule.

config route
        option interface 'wan'
        option target '90.155.53.19'
        option gateway '82.xxx.xxx.xx'
config rule 'aaisp_default'
    option dest_ip '90.155.53.19'
    option proto 'udp'
    option family 'ipv4'
    option use_policy 'default'

It looks like mwan3 still interferes with the tunnel somewhere.

I captured this in the syslog:

Mon Aug 24 22:01:17 2020 daemon.notice netifd: Interface 'aaisp' is setting up now
Mon Aug 24 22:01:17 2020 daemon.info xl2tpd[3192]: control_finish: Connection closed to 90.155.53.19, port 1701 (), Local: 37753, Remote: -1
Mon Aug 24 22:01:17 2020 daemon.info pppd[2871]: Exit.
Mon Aug 24 22:01:17 2020 daemon.notice xl2tpd[3192]: Connecting to host l2tp.aa.net.uk, port 1701
Mon Aug 24 22:01:17 2020 daemon.notice xl2tpd[3192]: Connection established to 90.155.53.19, 1701.  Local: 36911, Remote: 2465 (ref=0/0).
Mon Aug 24 22:01:17 2020 daemon.notice xl2tpd[3192]: Calling on tunnel 36911
Mon Aug 24 22:01:17 2020 daemon.notice xl2tpd[3192]: Call established with 90.155.53.19, Local: 24794, Remote: 26187, Serial: 147 (ref=0/0)
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: start_pppd: I'm running:
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "/usr/sbin/pppd"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "plugin"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "pppol2tp.so"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "pppol2tp"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "8"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "passive"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "nodetach"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: ":"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "file"
Mon Aug 24 22:01:17 2020 daemon.debug xl2tpd[3192]: "/tmp/l2tp/options.aaisp"
Mon Aug 24 22:01:17 2020 daemon.info pppd[6533]: Plugin pppol2tp.so loaded.
Mon Aug 24 22:01:17 2020 daemon.notice pppd[6533]: pppd 2.4.7 started by root, uid 0
Mon Aug 24 22:01:17 2020 kern.info kernel: [124931.647829] l2tp-aaisp: renamed from ppp0
Mon Aug 24 22:01:17 2020 daemon.info pppd[6533]: Renamed interface ppp0 to l2tp-aaisp
Mon Aug 24 22:01:17 2020 daemon.info pppd[6533]: Using interface l2tp-aaisp
Mon Aug 24 22:01:17 2020 daemon.notice pppd[6533]: Connect: l2tp-aaisp <-->
Mon Aug 24 22:01:17 2020 daemon.info pppd[6533]: CHAP authentication succeeded: IP47985
Mon Aug 24 22:01:17 2020 daemon.notice pppd[6533]: CHAP authentication succeeded
Mon Aug 24 22:01:17 2020 daemon.err odhcp6c[5021]: Failed to send DHCPV6 message to ff02::1:2 (Permission denied)
Mon Aug 24 22:01:17 2020 daemon.notice netifd: Interface 'aaisp6' is now down
Mon Aug 24 22:01:17 2020 daemon.notice netifd: Interface 'aaisp6' is disabled

The permission denied error I assume is the DHCPv6 client failing as it can't get an RA?

aaronjg commented 4 years ago

Oh, this is very helpful. So it looks like the IPv4 stuff was just failing because it ran into the IPv6 error when it was bringing the interface up, so it disabled the interface, and maybe also disabled the ipv4 interface as well?

So why is the DHCPv6 failing? The permission denied error suggests that the kernel is having trouble finding a valid route for the packet. So it sounds like odhcp6c is binding to aaisp6, then trying to send out either the dhcp6 request or the router solicitation, which is failing.

With mwan3 turned off, I'm guessing you don't get the permission denied message?

Could you turn off option ipv6 '1' on the aaisp interface and temporarily remove the aaisp6 interface to see if the IPv4 stack works with IPv6 disabled? If that is all fine we can look at getting the IPv6 stack working.

Maybe also try changing option proto 'udp' to `any' on the default rule so that if it uses tcp to set up the tunnel, those will also be sure to go through the default routing table.

jamesmacwhite commented 4 years ago

I'm not sure that's it to be honest. I don't think the IPv4 side is working properly either, but it can't establish the IPv6 side unless mwan3 is disabled.

I've tried just having IPv4 only enabled on aaisp and it's the same issue. It also looks like with mwan3 enabled it regularly keeps resetting the network interface as well. I found various logs like this and then the L2TP connection being restarted regularly every couple of minutes it seems. With mwan3 off it is OK.

This is what's showing in syslog related to the L2TP tunnel connection.

Tue Aug 25 07:08:55 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:08:56 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:08:58 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:09:02 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:19:02 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:19:03 2020 daemon.debug xl2tpd[3192]: check_control: Received out of order control packet on tunnel -1 (got 3, expected 0)
Tue Aug 25 07:19:03 2020 daemon.debug xl2tpd[3192]: handle_control: bad control packet!
Tue Aug 25 07:19:09 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:19:17 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:19:33 2020 daemon.notice xl2tpd[3192]: Maximum retries exceeded for tunnel 4091.  Closing.
Tue Aug 25 07:19:33 2020 daemon.err xl2tpd[3192]: udp_xmit failed to 90.155.53.19:1701 with err=-1:Operation not permitted
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: Terminating pppd: sending TERM signal to pid 29855
Tue Aug 25 07:19:33 2020 daemon.info xl2tpd[3192]: Connection 106 closed to 90.155.53.19, port 1701 (Timeout)
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Terminating on signal 15
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Connect time 2.6 minutes.
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Sent 462962 bytes, received 577 bytes.
Tue Aug 25 07:19:33 2020 daemon.err odhcp6c[30201]: Failed to send RS (Permission denied)
Tue Aug 25 07:19:33 2020 daemon.err odhcp6c[30201]: Failed to send DHCPV6 message to ff02::1:2 (Permission denied)
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Network device 'l2tp-aaisp' link is down
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Network alias 'l2tp-aaisp' link is down
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Interface 'aaisp6' has link connectivity loss
Tue Aug 25 07:19:33 2020 daemon.notice pppd[29855]: Connection terminated.
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Connect time 2.6 minutes.
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Sent 462962 bytes, received 577 bytes.
Tue Aug 25 07:19:33 2020 daemon.notice netifd: aaisp6 (30201): Command failed: Permission denied
Tue Aug 25 07:19:33 2020 daemon.info xl2tpd[3192]: control_finish: Connection closed to 90.155.53.19, port 1701 (), Local: 36468, Remote: -1
Tue Aug 25 07:19:33 2020 daemon.info xl2tpd[3192]: Disconnecting from 90.155.53.19, Local: 4091, Remote: 106
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Interface 'aaisp' is now down
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Interface 'aaisp' is setting up now
Tue Aug 25 07:19:33 2020 daemon.info pppd[29855]: Exit.
Tue Aug 25 07:19:33 2020 daemon.notice xl2tpd[3192]: Connecting to host l2tp.aa.net.uk, port 1701
Tue Aug 25 07:19:33 2020 daemon.notice xl2tpd[3192]: Connection established to 90.155.53.19, 1701.  Local: 62179, Remote: 3633 (ref=0/0).
Tue Aug 25 07:19:33 2020 daemon.notice xl2tpd[3192]: Calling on tunnel 62179
Tue Aug 25 07:19:33 2020 daemon.notice xl2tpd[3192]: Call established with 90.155.53.19, Local: 17863, Remote: 49948, Serial: 214 (ref=0/0)
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: start_pppd: I'm running:
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "/usr/sbin/pppd"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "plugin"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "pppol2tp.so"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "pppol2tp"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "8"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "passive"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "nodetach"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: ":"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "file"
Tue Aug 25 07:19:33 2020 daemon.debug xl2tpd[3192]: "/tmp/l2tp/options.aaisp"
Tue Aug 25 07:19:33 2020 daemon.info pppd[14939]: Plugin pppol2tp.so loaded.
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: pppd 2.4.7 started by root, uid 0
Tue Aug 25 07:19:33 2020 kern.info kernel: [158438.835304] l2tp-aaisp: renamed from ppp0
Tue Aug 25 07:19:33 2020 daemon.info pppd[14939]: Renamed interface ppp0 to l2tp-aaisp
Tue Aug 25 07:19:33 2020 daemon.info pppd[14939]: Using interface l2tp-aaisp
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: Connect: l2tp-aaisp <-->
Tue Aug 25 07:19:33 2020 daemon.info pppd[14939]: CHAP authentication succeeded: IP47985
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: CHAP authentication succeeded
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: local  IP address 81.187.237.242
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: remote IP address 81.187.81.187
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: primary   DNS address 217.169.20.21
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: secondary DNS address 217.169.20.20
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Network device 'l2tp-aaisp' link is up
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Network alias 'l2tp-aaisp' link is up
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Interface 'aaisp6' has link connectivity
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: local  LL address fe80::fd0e:1713:eb40:a209
Tue Aug 25 07:19:33 2020 daemon.notice pppd[14939]: remote LL address fe80::0203:97ff:fe05:4000
Tue Aug 25 07:19:33 2020 daemon.notice netifd: Interface 'aaisp' is now up
Tue Aug 25 07:19:33 2020 user.notice mwan3-hotplug[14866]: Execute ifdown event on interface aaisp (unknown)
Tue Aug 25 07:19:34 2020 daemon.err odhcp6c[30201]: Failed to send DHCPV6 message to ff02::1:2 (Permission denied)
Tue Aug 25 07:19:34 2020 daemon.notice netifd: Interface 'aaisp6' is now down
Tue Aug 25 07:19:34 2020 daemon.notice netifd: Interface 'aaisp6' is disabled
Tue Aug 25 07:19:34 2020 daemon.notice netifd: Interface 'aaisp6' is enabled
Tue Aug 25 07:19:34 2020 daemon.notice netifd: Interface 'aaisp6' is setting up now

I haven't worked with L2TP much before, so I could have a general configuration issue, but mwan3 seems to be doing some weird stuff to the L2TP connection. None of this behaviour seems to happen with mwan3 disabled, so it makes me think the connection is being interfered with. Even with having static routes and forcing the use of the default routing table for the tunnel endpoint, it doesn't seem to help.

Slightly unrelated but it also appears the the ip monitor route processes don't seem to be removed when stopping mwan3:

 1681     1 root     S     1784   0%   0% ip -6 monitor route
 1678     1 root     S     1784   0%   0% ip -4 monitor route

These are still running even though I stopped mwan3. Is that normal/expected?

aaronjg commented 4 years ago

The permission denied errors really look like the kernel is not able to find the route to 90.155.53.19, but with the explicit route set up and the rule to use the default routing table, I don't see how mwan3 is interfering.

Maybe try also bidirectional rules:

config rule 'aa_lt2p_out'
        option dest_ip '90.155.53.19'
        option family 'ipv4'
        option proto 'any'
        option sticky '0'
        option use_policy 'default'

config rule 'aa_lt2p_in'
        option src_ip '90.155.53.19'
        option family 'ipv4'
        option proto 'any'
        option sticky '0'
        option use_policy 'default'

With mwan3 turned off, could you try looking at ip -4 monitor route when you bring up the connection and see if LT2P is trying to do anything to the routing table that mwan3 might be interfering with?

I emailed Andrews and Arnold to see if they can set me up with a trial account so I can debug.

Slightly unrelated but it also appears the the ip monitor route processes don't seem to be removed when stopping mwan3.

Thanks for catching that. Looks like an issue with the procd stuff not properly killing all the rtmon processes. I'll just added a fix for that here: ec41fb96951d05b4eac19beaacb08d8ca8e9e69d

jamesmacwhite commented 4 years ago

Yeah it's really strange, tried with the in and out rules but it doesn't seem to improve anything. This is what ip -4 monitor route looks like on bringing up the l2tp-aaisp interface:

Deleted 90.155.53.19 via 82.15.36.1 dev eth1.2 proto static metric 10
90.155.53.19 via 82.15.36.1 dev eth1.2 proto static metric 10
local 81.187.237.242 dev l2tp-aaisp table local proto kernel scope host src 81.187.237.242
Deleted local 81.187.237.242 dev l2tp-aaisp table local proto kernel scope host src 81.187.237.242
local 81.187.237.242 dev l2tp-aaisp table local proto kernel scope host src 81.187.237.242
81.187.81.187 dev l2tp-aaisp proto kernel scope link src 81.187.237.242
default via 81.187.81.187 dev l2tp-aaisp proto static metric 50

@aaronjg I can provide the login to the L2TP tunnel privately if you'd like, if you are interested in potentially testing it/if AA don't get back to you. Though they are a great ISP, they know their stuff!

Thanks for the fix, I'll compile your branch again soon to get a new ipk.

Thanks for your help and guidance on this, it is appreciated!

openwrtdiy commented 4 years ago

@aaronjg @jamesmacwhite

ec41fb9 This time the repair is beautiful, and the policy assigned problem of changing the route is solved. The LAN terminal designated to take the VPN route has been realized (PPTP, L2TP, WireGuard)

Network interface WAN WANB PPTP L2TP WireGuard has no offline or redial issues. Setting the failover WANB to L2TP is normal, and the network is restored to WANB.

The following are the issues found:

  1. Set failover WANB to PPTP abnormal, jump to WAN interface! The network is restored to WANB.
  2. Set the failover WANB to WireGuard abnormally, jump to the WAN interface! The network is restored to WANB.
  3. MWAN3 status page display closed. Restart MWAN3 and restart the router, it still shows the closed state.

001

openwrtdiy commented 4 years ago

After restarting the router, re-test the MWAN3 failover. The first and second abnormal issues mentioned above are now resolved.

The results of the two tests are different.

aaronjg commented 4 years ago

@jamesmacwhite It appears that xl2tpd is somewhat broken on 19.07.3, and this issue may not be caused by mwan3.

If mwan3 is turned off, and you add a single rule so that the packets get a fwmark like so:

iptables --table mangle -D OUTPUT -d 90.155.53.19 -p UDP --dport 1701 --sport 1701 -j MARK --set-mark 0x1

Then the kernel will try to route them out of l2tp-aasip. This is despite not having any rules that pertain to the firewall mark or any additional routes that would cause these packets to be assigned to l2tp-aaisp. You can see this by adding logging rules at the end of the OUTPUT chain and the beginning of the POSTROUTING chain.

kern.warn kernel: [576733.222628] main output start IN= OUT=eth0.2 SRC=<wan ip> DST=90.155.53.19 LEN=920 TOS=0x00 PREC=0x00 TTL=64 ID=42237 PROTO=UDP SPT=1
701 DPT=1701 LEN=900 MARK=0x1                                                                                                                                        
kern.warn kernel: [576733.237060] postroute start IN= OUT=l2tp-aaisp SRC=<wan ip> DST=90.155.53.19 LEN=920 TOS=0x00 PREC=0x00 TTL=64 ID=42237 PROTO=UDP SPT
=1701 DPT=1701 LEN=900 MARK=0x1

I believe there is a bug somewhere in the l2tp stack. Either with xl2tpd or with the kernel itself. As a workaround, you can add a rule like this:

iptables --table mangle -I OUTPUT -d 90.155.53.19 -p UDP --dport 1701 --sport 1701 -j RETURN

to the beginning of the OUTPUT chain in the mangle table so that the tunnel packets are not marked by mwan3.

jamesmacwhite commented 4 years ago

@aaronjg Thanks for your very detailed investigation into this and helping my test my L2TP setup. Based on your further findings I would agree with you and consider this to be not a mwan3 problem, and possibly related to something specific with L2TP, either in the kernel or xl2tpd as you've mentioned. There seems to be abnormal routing behaviour going on.

Thank you for providing a workaround, I will look at implementing it as a custom firewall rule.

I will close this issue, given it's original title is misleading, and in the end not related to mwan3.

aaronjg commented 4 years ago

Sounds good. FWIW, it looks like this issue is resolved on 5.4.52, so I don't think it's worth building a workaround into mwan3.

jamesmacwhite commented 4 years ago

@aaronjg Good to know. The iptables workaround should be workable solution.

I've tested it out and with that rule in place everything just "works". It also explains why I was have difficulty with NAT configuration on the static IPv4. The replies were never going out through the right interface, so despite my firewall rules and such, nothing worked.

Although your debugging efforts should be documented somewhere somewhere. It is interesting that fwmark creates all sorts of problems here and surprised it hasn't been found by someone else, unless it is down to the specific configuration of this L2TP tunnel. Definitely going to raise it with Andrews and Arnold as their documentation could do with this for anyone else using OpenWrt as a minimum.

Thank you again for your help!

aaronjg commented 4 years ago

I will close this issue, given it's original title is misleading, and in the end not related to mwan3.

Can we reopen this issue (and maybe change the title to something like VPN tunnels over PPTP/L2TP/PPPoE interfaces are broken)? @wackejohn just ran into the same problem on a PPPoE connection, and I think it would be good to have an open issue for people who run into this in the future to find.

He was on the 5.4.61 kernel, so it appears that this issue is not resolved even on snapshot.

It is pretty clearly not an issue with mwan3, but perhaps mwan3 should have a way to work around it if many users are having issues.

jamesmacwhite commented 4 years ago

@aaronjg Re-opened and relabelled as requested. I assume it's because of fwmark problem as I had with L2TP? Did the iptables workaround work on snapshot?

aaronjg commented 4 years ago

Yes, same firewall mark problem. Slightly different setup as PPPoE was the bulk WAN and wireguard was the tunnel, but same symptom. Kernel was trying to route wireguard traffic back through the wireguard tunnel.

Could be recreated without mwan3 and with a simple fwmark rule, and adding a rule outside of mwan3 to not mark the traffic to the VPN endpoint fixed the issue.

jamesmacwhite commented 4 years ago

@aaronjg Interesting. It's good to know. I wonder how many more people are potentially hitting this.

Aside from fwmark it seems to specifically with PPP based interfaces

openwrtdiy commented 4 years ago

@aaronjg Interesting. It's good to know. I wonder how many more people are potentially hitting this.

Aside from fwmark it seems to specifically with PPP based interfaces

I am using mwan3+pppoe-server and mwan3+pptp+l2tp+wireguard firmware routers! My final idea is to integrate the above two into one solution. Assign relevant lines to user terminals through pppoe.

jamesmacwhite commented 4 years ago

That's a very interesting setup! How well is it working currently with the use of L2TP and PPPoE? Do you encounter the same issues with fwmark reported above?

openwrtdiy commented 4 years ago

That's a very interesting setup! How well is it working currently with the use of L2TP and PPPoE? Do you encounter the same issues with fwmark reported above?

Neither of the above two routing firmwares integrates ipv6 function, mwan3+pppoe-server: I haven’t found any problems with the MWAN3 distribution network segment line, but there are more pppoe-servers at present, which may be the reason why the rp-pppoe plugin is not maintained. . Mwan3+vpn has encountered many problems, but my knowledge is limited and the expression is not accurate.

ptpt52 commented 3 years ago

good news, I am trying to fix this related issue in the code of wireguard

https://github.com/x-wrt/x-wrt/blob/master/package/network/services/wireguard/patches/100-scrub-skb-sk-before-send-out.patch

aaronjg commented 3 years ago

I tested this patch on a snapshot build running linux 5.4.72 and can confirm that @zx2c4's patch fixes the issue.

We can leave this open until that patch gets merged into a kernel used by OpenWRT or the patch is backported to mwan3.

zx2c4 commented 3 years ago

Applied to net.git now, so this should be in OpenWRT at some point.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=af8afcf1fdd5f365f70e2386c2d8c7a1abd853d7 https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=46d6c5ae953cc0be38efd0e469284df7c4328cf8

jamesmacwhite commented 3 years ago

Closing this as 19.07.5 and above should fix the problem with the patch merged in. I no longer have to use iptables --table mangle -I OUTPUT -d 90.155.53.19 -p UDP --dport 1701 --sport 1701 -j RETURN in my firewall.