Some Kernels have broken SO_REUSEPORT handling

kaechele commented 4 years ago

This bug is meant for future reference.

Newer versions of the Tunneldigger broker use SO_REUSEPORT to process multiple tunnels on one single port. In the past Tunneldigger used a NAT-based workaround to make this work. To simplify the code and remove unnecessary dependencies this workaround was removed. Unfortunately there are several kernel bugs that prevent SO_REUSEPORT for UDP sockets from working properly, that are only fixed in fairly recent kernels. This means that the change in conjunction with the bug has some peculiar implications for which Kernel versions can be used for brokers. (Tunneldigger clients are unaffected by all of this.)

Kernel versions 5.10.152 and newer exhibit the correct behaviour and should work.

You have probably landed here because you still use an older Linux distribution or haven't updated to a working Kernel version. If you are experiencing this issue you have two options:

Update your kernel to a supported version or upgrade your distribution to one that has a supported kernel version. In particular, Fedora 35 and newer as well as Debian 11 (Bullseye) and newer with the latest updates applied should work.
If you cannot upgrade the kernel, switch to the legacy branch that still carries the NAT hacks.

Kernel fixes

For the curious among you, the two fixes that are needed are:

net: udp: prefer listeners bound to an address (landed in 5.0)
udp: correct reuseport selection with connected sockets (landed in 5.4, backported to: 5.3.1, 5.2.17, 4.19.75, introduced a bug fixed by commit 69421bf98482d089e50799f45e48b25ce4a8d154 below)
udp: Update reuse->has_conns under reuseport_lock. (landed in 6.1, backported to 6.0.6, 5.15.76, 5.10.152, not backported to 5.3.1, 5.2.17, 4.19.75)

RalfJung commented 4 years ago

Debian oldstable (stretch) currently only offers kernels that are affected by this bug; I submitted a bugreport about updating the backport kernels (it's already a 4.19 kernel, but not new enough).

The Debian stable (buster) kernel is fine (4.19.98). On Ubuntu 18.04 (bionic), install linux-generic-hwe-18.04 (at least this changelog says that should suffice).

PolynomialDivision commented 4 years ago

Could u say something what happens with old kernels? I updated the version to 19.07.4 with 4.14.180 or 4.19.X kernel. I don't see any obvious issues with that kernel and the version while running tunneldigger. I will investigate further.

RalfJung commented 4 years ago

Clients were unable to reliably connect with broken kernels, so we saw many connection timeouts or other disconnects in the logs. Also see https://github.com/wlanslovenija/tunneldigger/issues/129. You should also see warnings specifically pointing out that the kernel is likely buggy.

Maybe Ubuntu backported the problematic patches, who knows. (I assume you are using Ubuntu? You didn't state the distro you are using.)

PolynomialDivision commented 4 years ago

Maybe Ubuntu backported the problematic patches, who knows. (I assume you are using Ubuntu? You didn't state the distro you are using.)

OpenWrt. Thanks I will have a look at the log.

RalfJung commented 4 years ago

This issue is about the tunneldigger broker. Are you really running that server-side component on OpenWrt?

PolynomialDivision commented 4 years ago

This issue is about the tunneldigger broker. Are you really running that server-side component on OpenWrt?

No. :O Sry, than everytihng is fine. :D

neocturne commented 3 years ago

Would using SO_REUSEADDR instead of SO_REUSEPORT be an option? At least using a short test program, kernel 4.19 doesn't seem to show this bug with SO_REUSEADDR (I have not checked older kernels).

While implementing L2TP support for fastd (still work in progress), I noticed another advantage of SO_REUSEADDR: It can be set after bind() of the first socket, while SO_REUSEPORT needs to be set before bind(), which may accidentally allow two processes of the same user to bind to the same port.

With SO_REUSEADDR this can be prevented: Let a process bind its first socket without SO_REUSEADDR; this will fail if the port is already bound by another process. Then set SO_REUSEADDR on the first socket. On subsequent sockets, set SO_REUSEADDR before bind(), so they are allowed to use the same port as the first socket.

RalfJung commented 3 years ago

Would using SO_REUSEADDR instead of SO_REUSEPORT be an option?

I have to admit I am out of my league here; the differences between these flags are beyond my experience in this space. @kaechele did the implementation with SO_REUSEPORT, he might be able to comment. Other than that, if someone writes a PR that switches to SO_REUSEADDR, I'd be willing to test that on our servers and merge it if it works.

kaechele commented 3 years ago

I initially implemented this using SO_REUSEPORT as my research suggested this to be best practice from a security standpoint. Your way of utilizing SO_REUSEADDR looks like a smart way to avoid double-binding a port already in use by the same user but for a different application. Correct me if I'm wrong here but it looks like you trade the same-user bind protection for protection of user error in this case. It seems like an edge case scenario that some other (malicious) user on the same machine would try to abuse a reused port to intercept or alter traffic. Given that L2TPv3 is not encrypted or authenticated anyway. So this is a sensible trade-off in my eyes.

I don't know if I have an immediate need to switch the current implementation over to SO_REUSEADDR but I'm sure it would be a quick thing to do anyway. In any case I'm looking forward to playing with fastd's implementation as I love the idea of flexibility in selecting L2TP as an option if I require speed over security.

neocturne commented 3 years ago

Correct me if I'm wrong here but it looks like you trade the same-user bind protection for protection of user error in this case. It seems like an edge case scenario that some other (malicious) user on the same machine would try to abuse a reused port to intercept or alter traffic.

This is correct. If running on the same machine as untrusted users, only using low ports for L2TP would mitigate the issue.

pmelange commented 1 year ago

So, we (freifunk berlin) have been trying to use the NAT-removed version and have run into some strange issues. It seems like if an in-between router which is also doing NAT (perhaps with an older kernel) then the post-NAT-removal doesn't work and the tunnels time out. It's stange because it works for some people, and not for others. And the only difference we can find is in the router in-between. For example, it works with a recent openwrt image just fine, but with a fritz 7590 with firware 7.50 it doesn't

We have reverted to 7c467e68021526b8631e8a53a9022aa223

kaechele commented 1 year ago

Sounds like an issue unrelated to this Kernel bug, possibly in the NAT implementation of the faulty routers. In any case, it would probably be best to open a separate issue and attach some debugging information so the issue can be looked into. Good debugging info would be excerpts of the conntrack table from affected routers or maybe even packet captures.

wlanslovenija / tunneldigger

Some Kernels have broken SO_REUSEPORT handling #126

Kernel fixes