Yggdrasil breaks NAT64 connectivity

bjtftw commented 4 months ago

An interesting phenomena according to RFC6724, 300::/8 and NAT64 (64:ff9b::/96)

Let me clarify something at the beginning, it's not related to the code of Yggdrasil it's just a bad decision to use 200::/7 as a prefix for this project.

Quick definition: GUA = just an regular global IPv6 address (not a special type or local one)

THE PROBLEM

When you have on your main network interface (the one you use to connect to Internet, not tun, not second ethX., etc., so usually it's eth0 or wlan0) multiple IPv6 GUAs from different IPv6 networks (prefixes, like one from 2000::/3 for your Internet connectivity and one from 300::/8 for Yggdrasil) then according to RFC6724 your OS may use any(!) of those GUAs as source IP for new Internet (2000::/3 + 64:ff9b::/96 NAT64 destinations) connection and in some situation Linux will try to use our IP from 300::/8 range and make connection with it as source IP through default(!) gateway which of course cannot work as default gateway cannot route from/to 300::/8 subnet properly. What's interesting, this is completely legit and is described in RFC so probably every OS will do this but let's use Linux for now.

There are many conditions (https://www.rfc-editor.org/rfc/rfc6724.html#section-5) that decide if we get this problem (we need to get to Rule 8.) but for most users most of them won't show any preference for which IP to use as source for new outgoing connection and only those two are important to be fulfilled to get into troubles:

you need to have multiple IPv6 GUAs from different networks (different prefixes) on the same main network interface (usually eth0 or wlan0). Like when you have IPv6 from your ISP (something from 2000::/3) and you have IP address from Yggdrasil network (something from 300::/8) .
both those addresses must be of the same type: static, dynamic or temporary

In this situation depending on the new connection destination IPv6 address (it's IPv6 prefix) OS (in my case Linux) may use IP from 300::/8 subnet as source IP for that connection due to RFC6724 - "Rule 8: Use longest matching prefix." when all seven previous rules aren't decisive.

When we get problem?

With NAT64 64:ff9b::... destinations because Yggdrasil 0300:... prefix shares leading "0" with 0064:... while our main/Internet IPv6 usually starts with 2... (2000::/3) so it does not share any prefix leading part with destination IP. We need to remember that from OS standpoint 300::/8 are normal global IPv6 addresses (not a special type ones that would limit their usage for global connectivity) so this is why OS is taking them into account while deciding which IP to use as source one for Internet (global) connection.

OK here is the test if you have some spare machine (you can use virtual, lxc/lxd or livecd for this, we don't need true Internet connectivity so even if you don't use NAT64 in your network you will see bad results as it is a consequence of routing logic so we don't need to even send any data):

TESTING PROCEDURE:

We will use those:

eth0 - main network interface 2000::b/64 our main IPv6 address (Internet) set manually on eth0 2000::a our default gateway (Internet) 300::b/64 Yggdrasil IP we set manually on eth0

take machine without any IP addresses configured so we have clear situation here
assign this: ip addr add 2000::b/64 dev eth0
bring up the interface (in case of being down you won't be able to add default gateway in 4.) ip link set eth0 up
add default gateway address ip route add default via 2000::a
check OS routing logic according to NAT64 destinations ip route get 64:ff9b::
analyze results 64:ff9b:: from :: via 2000::a dev eth0 src 2000::b metric 1024 pref medium

As you see OS connects to 64:ff9b:: via 2000::a (default gateway) with source IP 2000::b which is all OK, that will work

now let's add our Yggdrasil 300::b/64 to this interface ip addr add 300::b/64 dev eth0
and check NAT64 destination routing logic again ip route get 64:ff9b::
let's analyze 64:ff9b:: from :: via 2000::a dev eth0 src 300::b metric 1024 pref medium

And now we have problem, we go to 64:ff9b:: via 2000::a (default gateway) but OS took 300::b as source IP for this connection. This of course won't work.

That's all the story. As it is a RFC thing probably there is not much we can do about it but in some situations there are workarounds.

WORKAROUNDS

For those that use fully static IP config (2000::/3 and 300::/8) on their interfaces there is simple solution, when you add default gateway to your system you need to specify that you want to always use specific IPv6 address as source for connections through this gateway and it looks like this: ip route add default via 2000::a dev eth0 src 2000::b so now we have 64:ff9b:: from :: via 2000::a dev eth0 src 2000::b metric 1024 pref medium and that is correct. This way OS will always use 2000::b as source even if "Rule 8" from RFC will still prefer 300::/8 one. Just remember you need to do this on every machine.

For those that use SLAAC with privacy extensions (default on Linux) for IPv6 config things gets complicated as our Internet IPv6 is regularly changing (due to preferred_lft setting and due to fact that our ISP subnet usually is also changing within every network restart or every couple of days), so it's hard to control this and code it to update ip route change default via ... src ... and you need to do this on every(!) machine in LAN, some of those devices are Android and other OSes (TVs, watches, audio equipement, etc.) where you cannot add custom routing rules, scripts, etc.

CONCLUSIONS

It's a real problem as Yggdrasil may kill IPv4 Internet connectivity for those that use NAT64 and NAT64 with 64:ff9b:: prefix is de facto standard in cell networks, so these are already milions of users, of course the question is how many of them is using Yggdrasil but what's dangerous here is that scenario with SLAAC for Internet and Yggdrasil 300:/8 configuration is a typical situation for them, as these types of Internet connectivity is made to be "auto" config.

How big is the problem? Well now when I think of, it may be pretty big but it depends how NAT64 functionality is implemented, if ISP uses 464xlat without DNS64 than problem may be never seen but when ISP use DNS64 (which is common scenario) then users still may try to connect directly to NAT64 64:ff9b::/96 network even if 464xlat is also used (for ex. Firefox is capable of identifying NAT64 and then it does create NAT64 "AAAA" equivalents of "A" DNS internally).

It looks like the only solution is to switch the project to other prefix ("above" 2000::/3).

What's even worse due to Rule 7. @RFC6724 it's possible to get same bad results there, so we have non temporary (for ex. manually set) Internet IPv6 (2000::/3) and temporary Yggdrasil (300::/8) via SLAAC (radvd) with privacy extensions enabled (which is default on Linux). But haven't checked that.

neilalexander commented 4 months ago

So we've had a lot of discussions about IP space in the past and believe me when I say there's no range that makes everyone happy. In the past we had used fc00::/7 but that trampled on many organisational and ULA-prefixed networks. We lose even more public key bytes if we try to fold into a smaller subnet, we can't really overlap with any global unicast space and, at this point, changing the subnet and therefore everyone's addresses on the network is painful for users too.

I'd love for there to be a truly good solution for this, but in the end, IPv6 compatibility is a way for us to test the routing scheme works whilst allowing people to use applications they already know and like, most importantly, without modification. 0200::/7 was a compromise it is a deprecated range and therefore nothing else should be using it, but yes, source address selection sucks.

peigongdsd commented 4 months ago

I prefer the yggstack implementation a lot. Not yggstack itself, but rather the pk.ygg dns system: there's a well-known strategy called fakeip that a dns server returns a fake and randomly generate ip, which routes to a certain tun, then handle the packets on tun with that ip, forwarding the traffic to the actual target. With pk.ygg using this fake-ip strategy maybe we can get rid of the session layer of yggdrasil and use public key directly as addresses (of course with pk.ygg fake-ip dns properly set up), which may contribute a lot to performance since such a strategy omits the process of pubkey discovery.

peigongdsd commented 4 months ago

Fakeip is a technology widely used on tools tailored for Chinese users to transparently route all the traffic to a proxy server in order to bypass the network censorship. There are some explanations on it. fake-ip

yggdrasil-network / yggdrasil-go

Yggdrasil breaks NAT64 connectivity #1155