zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.39k stars 1.68k forks source link

6RD IPv6 breaks ZeroTier #2141

Open darkain opened 1 year ago

darkain commented 1 year ago

Please let us know


Backstory, as many of the ZT staff are aware, I've been running an edge router network with ZeroTier on the routers and OSPF on top of that to manage LAN route delegation between routers for several years. Every router is OPNsense based. I've had on-and-off connectivity issues the entire time with this setup, mostly centering around a particular node's connectivity. Some nodes would maintain stable connectivity with it, others would work after a "service zerotier restart", and then fail after a few minutes to a few hours, sometimes coming back again in a few minutes/hours, flapping back and forth over time.

After years of grueling trial and error, I think I've finally pinpointed the configuration that breaks things!

The two nodes that would always stop talking to each other are both dual-stacked WAN, while every other node in the mesh is IPv4 only. One node is native dual-stack, the other is CenturyLink's 6RD IPv6 implementation. I upgraded a 3rd node to IPv6 native connectivity, and the two nodes with native IPv6 communicated between each other without an issue. However, as soon as the active connection between the 6RD and this new IPv6 node became active (switching from v4 to v6), it instantly broke, just like the previous connection.

Node A: IPv4 + 6RD Node B: IPv4 + IPv6 Native Node C: IPv4 + IPv6 Native

A <> B (IPv4) works A <> B (IPv6) breaks instantly

A <> C (IPv4) works A <> C (IPv6) breaks instantly

B <> C (IPv4) works B <> C (IPv6) works

ZeroTier makes multiple UDP connections between each host. I monitored the connection to see which one was active (one of the IPv4 or IPv6 connections) using zerotier -j peers. Whenever a 6RD enabled host swapped to one of the IPv6 connections, no data would flow through the ZeroTier tunnel anymore.

The 6RD feature on OPNsense/FreeBSD creates a separate interface in FreeBSD, so I have re0 for WAN and wan_stf for the 6RD interface. So they are two separate interfaces, rather than a single interface with dual-stack IPs. The mtu on this interface is only 1280, so that may possibly be the issue?

I'm now running with a local config on these 6RD enabled nodes of "settings": {"interfacePrefixBlacklist": ["wan_stf"]} to prevent ZT from binding to this interface entirely. Having ZT work exclusively on IPv4 on these boxes is working for the time being.

It would however be nice to have dual stack working again.

(really, I blame CenturyLink for this mostly, offering a crap half-assed IPv6 implementation instead of true native dual-stack)

laduke commented 1 year ago

Glad you found it. I wonder if anyone else has experienced this.

Is it just: CenturyLinks' 6RD, 6RD in general, the MTU... or what combination?

The mtu on this interface is only 1280, so that may possibly be the issue?

maybe!?

It'd be nice to exactly where it's breaking, but that's hard to figure out with the current debugging tools, and you need access to 6RD.

Might add "settings": {"interfacePrefixBlacklist": ["wan_stf"]} to the list of recommendations in the opnsense docs for now.

darkain commented 1 year ago

Sadly, this one CenturyLink ISP connection is the only 6RD access I have. I do however have a block of IPs with a couple sitting idle, so if need be, I could spin up a dedicated test instance if ZT staff wants to poke around a bit.

Also, the wan_stf recommendation would equally apply to FreeBSD, OPNsense, and pfSense if 6RD is in use.

I plan to also test IPv6 via nodes on the LAN too, to see if its just on the edge node with the wan_stf interface, or if this issue also happens on everything on the network. If other things on the network behave properly, then that means the reduced MTU is fine.

Part of me has been wondering if its because of the double-encapsulation of the packet. That maybe there is a ZT or a FreeBSD bug between sending the packet from ZT's encapsulator over to the kernel's 6RD encapsulator instead of directly out an interface.