zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.32k stars 1.67k forks source link

Dream-Machine-SE wireguard: wgsts1000: possible loop detected, dropping skb of size 65216 #2178

Open MrLimoAK opened 10 months ago

MrLimoAK commented 10 months ago

I’m assuming this is an incompatibility with Ubiquiti’s new Site Magic. ZT used to work just fine on my SE.

Dream-Machine-SE wireguard: wgsts1000: possible loop detected, dropping skb of size 65216 repeats in on the console....

I just removed ZT and all it's configuration and started over, with the currently available ZT package from: https://download.zerotier.com/dist/ubiquiti/zerotier-one_arm64.deb

Dream Machine SE running: UniFi OS v3.1.16 Network v7.5.187 Protect 2.8.35

I sent a request back in September when this issue first appeared and did not receive even a single hint/suggestion or solution from the discussion forum.

So I’m posing it hear in hopes of finding a solution.

laduke commented 10 months ago

Hello, We're not familiar with that error. It's coming from some wireguard service. Are there any wireguard settings your can fiddle with?

MrLimoAK commented 10 months ago

It only happens with the latest OS update and ZT. Remove/Stop ZT and the error messages goes away.

MrLimoAK commented 10 months ago

39499root@Dream-Machine-SE:/config/zerotier-one# sudo systemctl status zerotier-one.service ● zerotier-one.service - ZeroTier One Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled) Active: failed (Result: core-dump) since Fri 2023-11-17 10:31:55 AKST; 2min 35s ago Process: 39499 ExecStart=/usr/sbin/zerotier-one (code=dumped, signal=ABRT) Main PID: 39499 (code=dumped, signal=ABRT) CPU: 108ms

Nov 17 10:31:46 Dream-Machine-SE systemd[1]: Started ZeroTier One. Nov 17 10:31:54 Dream-Machine-SE systemd[1]: Stopping ZeroTier One... Nov 17 10:31:54 Dream-Machine-SE zerotier-one[39499]: terminate called after throwing an instance of 'std::system_error' Nov 17 10:31:54 Dream-Machine-SE zerotier-one[39499]: what(): Unknown error 8367144 Nov 17 10:31:55 Dream-Machine-SE systemd[1]: zerotier-one.service: Main process exited, code=dumped, status=6/ABRT Nov 17 10:31:55 Dream-Machine-SE systemd[1]: zerotier-one.service: Failed with result 'core-dump'. Nov 17 10:31:55 Dream-Machine-SE systemd[1]: Stopped ZeroTier One. root@Dream-Machine-SE:/config/zerotier-one#

cepm-nate commented 10 months ago

Just throwing a comment in here, seeing the same thing on a UDM-SE here when running the service.

wireguard: wgsts1000: possible loop detected, dropping skb of size 65216

Would love to see a solution, or at least an acknowledgement from the ZT team, as running this inside a router would be much preferred to running it on some other device that needs to be monitored.

glimberg commented 10 months ago

This is not an error or warning message that is coming from ZeroTier, so I don't see much that can be done from this end.

rcoder commented 10 months ago

So the likely source of this error appears to be related to the custom kernel patches that Ubiquiti ships on the UDM firmware: https://github.com/tusc/wireguard-kmod/issues/80. Beyond that issue, I've found quite a few Ubiquiti community discussions describing issues configuring normal Wireguard alongside the UDM "Teleport" feature (which also appears to use WG under the hood).

Given that, I'm m curious if you have one or both of a generic Wireguard network or the Teleport feature enabled on your device(s)? Generally speaking, running multiple VPNs on a single device requires some pretty careful management of system and network configuration, and "one-click" tools like Teleport don't often give you the needed knobs and gauges to do that.

Regardless, @glimberg is right that our ability to debug issues specific to Ubiquiti's custom Linux kernel is somewhat limited. We have some personal access to Ubiquiti hardware, but not any kind of direct line to their kernel team.

cepm-nate commented 10 months ago

Thanks for the acknowledgement and digging!

Yes, I have SiteMagic enabled which, I believe, also uses WireGuard under the hood. I also have WireGuard enabled as a VPN Server on the controller which is used for a couple laptops.

I'll try opening a case with Unifi, and will update the thread here as it progresses.

cepm-nate commented 10 months ago

Their first reply:

Hi,

Thanks for contacting Ubiquiti Support! I am (name removed) from the UniFi Security and VPN team. I will be assisting you with your ticket.

Thank you for reaching out regarding the configuration issue you are experiencing with WireGuard and ZeroTier. I'm here to help you address this matter.

WireGuard Configuration:

Network Application: You should be able to configure all the WireGuard VPNs using the UniFi network application. Could you please provide more details on the specific issue you encounter when configuring WireGuard using the network application of ZeroTier? CLI Configuration: I noticed that you mentioned configuring the UniFi gateway using the CLI. Please be aware that making changes using the CLI is outside of our recommended support scope. For a more stable and supported configuration, we strongly recommend using the network application for any changes. Best,

UI Support Ubiquiti Inc.

and my response:

Thank you for taking the time to reply to my support request.

Other router manufacturers support ZeroTier natively, and it would be really great to see Unifi routers also do so (not just the Edge series). That would open up a whole new world of global connectivity!

You mentioned "the network application of ZeroTier". Does Unifi has some other supported method of installing ZeroTier onto a UDM-SE?

I understand that CLI changes are outside of your recommendation. However, your team is the best shot we have at making this work, since the code that creates the error was added as a patch into the kernal by your team https://github.com/tusc/wireguard-kmod/blob/main/src/bases/udm-2.4/patches/wireguard-linux-compat/041-ubnt-protection-from-routing-loops.patch, which you can see referenced here https://github.com/tusc/wireguard-kmod/issues/80#issuecomment-1453316614.

That patch persists up into v3 of UnifiOS.

Could you please pass this to someone responsible for WireGuard integration, and see if they have any sugguestions?

Just to recap, the error shows up instantly after installing ZeroTier, before ZeroTier is joined to any networks or any ZeroTier routing is added.

Thank you, and may you have a wonderful day!

cepm-nate commented 10 months ago

And their reply, for what it's worth:

Hi,

Thank you for the update. As mentioned earlier, we do not recommend making any changes via CLI. For advanced configurations using CLI, you can refer to our UI community and engage with other users who might have similar configurations. Please ensure to select the proper category when posting in the UI community.

We recommend making any changes using the network application. Could you please confirm if you were able to upload the VPN configuration file in the network application, configure the proper credentials, and check if it was working? If it wasn't working through the network application, please provide additional information, including any error messages you encountered during the configuration, along with the VPN file and credentials. This will help us investigate at our end and forward the information to our internal team.

Anyone here know how to get through to a developer, or someone who knows more than scripted replies at Unifi?

They replied with generic info about where to configure the WireGuard VPN in the GUI, and I replied with:

Thank you. I know what the "Network Application" is, and when ZeroTier is not running, my WireGuard VPN (and SiteMagic) both work fine. From some of your comments, it sounded as if you were referring to some kind of ZeroTier Network Application area.

The issue here is that Unifi's WireGuard implementation includes a patch which prohibits ZeroTier from operating.

Because this is a Unifi product and special implementation of WireGuard, is there any chance you could find it in your heart to forward or escalate this ticket to someone with knowledge on the specific implementation of WireGuard, and see if they have any insights on how to bypass the patch which Unifi included?

I realize this is not the run-of-the-mill support request. Please understand that as you are the manufacturer of this device, you (Unifi) are our best hope of getting this issue resolved.

The specific code patch that detects a possible loop (and also detects and fails when ZeroTier is running) reads like this:

====================================== if (unlikely(skb_end_offset(skb) > 33000)) { ret = -ENETDOWN; net_crit_ratelimited("%s: possible loop detected, dropping skb of size %u\n", dev->name, skb_end_offset(skb)); goto err; }

Thank you for sticking with this support ticket so far.

I don't have high hopes for their support on this. But I DO hope that someone from Unifi will stumble across this and realize what an incredible oppertunity awaits Unifi if they were to natively support ZeroTier. (or at the very least, remove the patch that mistakenly detects ZeroTier prior to any routes or network joins)

cepm-nate commented 10 months ago

interestingly enough, I get the exact same "possible loop detected" if I run a command like

ping -I wgsts1000 8.8.8.8

wgsts1000 I think is the internal interface for the connecting default WireGuard VPN.

laduke commented 10 months ago

I hope it somehow gets resolved. Can you firewall those interfaces from talking to each other? Might help a little

cepm-nate commented 10 months ago

Great idea!

I don't know what interfaces ZT makes, because I disable it instantly after creating it so WireGuard does not throw errors. I'll give that a shot soon.

cepm-nate commented 8 months ago

While it does look like wgsts1000 is the wireguard (Site Magic) interface, what OTHER interface would I put in the firewall rule? ZeroTier's interfaces follow the format of zt____, where ___ depends on the network being joined. Since the error message shows up PRIOR to any ZT network being joined, it follows to reason that there would not yet be any zerotier interface.

I can still give it a shot after hours AFTER joining a ZT network, just to see if it cuts out the errors, but it seems illogical because the errors show up PRIOR to joining a ZT network.