zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.53k stars 1.7k forks source link

Intermittent Network Latency/Outages on Windows Devices that Route (Forward) Traffic #1428

Open JonathonFS opened 3 years ago

JonathonFS commented 3 years ago

EDIT: 8/28/2021 - Updated section 1.2 to reference comment below for instructions on replicating the issue with Win 10.

1. Overview

I believe there's an issue in the Windows ZeroTier NDIS 6.1 Miniport driver that can cause intermittent latency and dropped packets. This unwanted behavior is particularly noticeable when attempting to use ZeroTier on a Windows device that's configured as a router. I'll cover how my environment is setup, what is triggering the unwanted behavior, why I think it's happening, some crude work-arounds, and recommendations to fix what I suspect is the root issue.

1.1 Environment

This is a example environment that reflects the pertinent details of my network. If you want to look at my particular network, feel free to reference support ticket ZT-3762, which should contain links to my account and subscription.

ZeroTier Central:

Physical Office Network:

Remote Client:

Windows Router:

1.2 Steps to Reproduce the error

  1. Configure an environment relatively similar to the one stated above. The Windows Router must be a windows device. The client can be any OS.
  2. Install the Routing and Remote Access Server (RRAS) role on the Windows Server.
    1. See my comment below for instructions on how to replicate this issue with a Windows 10 Pro or Enterprise workstation.
  3. Identify an unused IP on the Physical network connected to the Windows Router. For example 192.168.0.214
  4. Attempt to ping the unused IP from the Windows Router.
  5. From the Windows Router command line, run arp -a and verify the unused IP is not listed in the ARP cache.
  6. From the Windows Router, setup a recurring ping to the Remote Client. For example: ping -t 10.10.0.31
  7. From the Remote Client, attempt to ping the unused IP: For example: ping -t 192.168.0.214
  8. Note that every time the Remote Client pings, the recurring ping on the Windows Router will incur a ~2000 ms delay.

Ping Latency Example

1.3 Desired Behavior

When a ZeroTier device initiates a connection to an internal IP that ARP can not resolve, the RRAS server's ZeroTier NIC continues to pass traffic.

1.4 Observed Behavior

When a ZeroTier device initiates a connection to an internal office IP that ARP can not resolve, the RRAS server's ZeroTier NIC becomes unresponsive for ~2 seconds. If this scenario occurs repeatedly, the latency builds up to the point that the RRAS server's ZeroTier NIC becomes unable to pass any traffic. Since the RRAS server's ZeroTier NIC IP is also the next hop for ZeroTier devices to access my physical/office network, the office network becomes unreachable.

This unwanted behavior isn't just caused by ICMP (ping) traffic. It can be generated by any type of IPv4 traffic (IPv6 doesn't use ARP for neighbor discovery) destined to an IP that doesn't exist. In an office/enterprise network, there are many reasons why a workstation would try to contact an IP that doesn't exist. If a share drive get's moved, and people don't re-map it. If a printer goes offline (temporarily or permanently). Third-party software with old/noisy configurations. When enough of this traffic occurs in succession, the ZeroTier NIC will become completely unavailable and all traffic will stop passing.

1.4.1 Why is ARP such a Problem?

I suspect that ARP isn't the only protocol that can suffer from this issue, but it is the one I can most easily demonstrate the problem with. In some situations, Windows is particularly noisy when it comes to ARP neighbor discovery. Windows will wait 1 second to hear back an ARP response, before sending another ARP request. It will send up to 3 ARP requests. This means every packet destined to an unreachable IP will yield 3 ARP requests, 1 second apart.

image

For some reason (which I'll theorize on next) the ZeroTier NIC doesn't seem to pass any traffic for those 2 seconds while Windows is performing the ARP requests. Obviously, Windows needs to know the destination MAC address before it can forward the packet. So it's as if the ZeroTier NIC is unable to process more traffic, until after Windows forwards (or drops) the current packet.

2. Root Cause (Theoretical)

I did some reading up on Windows NDIS miniport drivers and took a look through the ZeroTier Windows TapDriver6 source code. I'm familiar with C/C++/C#, but have zero experience writing NDIS drivers, so the following is just a theory based on the information I could find.

2.1 Initial Impressions Point to the TAP Driver

A user on the OpenVPN forum is having the same issues as me with their RRAS server. The ZeroTier Windows TAP driver is based on the OpenVPN driver. This makes me think the problem is not in the ZeroTier One client code, but rather in the TAP driver.

This problem may have surfaced in 2015 when ZeroTier transitioned from an NDIS 5 TAP driver to an NDIS 6 TAP driver. NDIS 6 was a big change, and brought about new requirements. Of note, NDIS 5 miniport drivers could be serialized or deserialized. Serialized drivers rely on NDIS to managed their send (and receive?) queues. Deserialized drivers have to manage everything themselves. With the transition to NDIS 6, Microsoft dropped support for serialized drivers, so now everyone had to start managing their send (and receive?) queues. If OpenVPN's NDIS 5 driver was serialized, the move to NDIS 6 may have introduced this issue. Source: Miniport drivers

2.2 Intro to NDIS Receive Buffers

If this is a driver issue, the problem would lay somewhere in how the miniport driver delivers packet data to the Windows router. NDIS miniport drivers call the function NdisMIndicateReceiveNetBufferLists to present a structure of packet data called a NET_BUFFER_LIST (or just NBL for short) to higher level NDIS drivers. In our scenario the higher level NDIS driver would be a Windows RRAS driver, used to re-write packet headers and forward them. Miniport drivers usually have a limited number of NBLs available, as they reside in privileged memory, which is expensive real estate. The Intel NIC drivers on my server and laptop are both limited to 256 NBLs:

image

When NdisMIndicateReceiveNetBufferLists sends an NBL, ZeroTier relinquishes ownership of the NBL to the higher level driver. The NdisMIndicateReceiveNetBufferLists function takes a ReceiveFlags parameter, which can be configured with the NDIS_RECEIVE_FLAGS_RESOURCES flag. If the NDIS_RECEIVE_FLAGS_RESOURCES flag is present, the higher level driver will copy the NBL data to it's own buffers, and immediately return ownership of the NBL to the miniport driver. If the flag is not present, the higher level driver will wait to return the NBL until after it's done with it, by calling MiniportReturnNetBufferLists. Source: Indicating Received Data from a Miniport Driver

Microsoft goes into further detail in another article by stating _"Setting the NDIS_RECEIVE_FLAG_RESOURCES flag in the ReceiveFlags parameter forces the protocol drivers to copy the network data and release the NET_BUFFER_LIST structures to the miniport driver. Driver writers should design their miniport drivers with enough preallocated NET_BUFFER_LIST structures to avoid unnecessary copying."_ This last sentence is particularly interesting. It implies that the NDIS_RECEIVE_FLAGS_RESOURCES flag will reduce performance, and should only be used when absolutely necessary. The most obvious scenario would be to only use it when you have 1 spare NBL left. Things may run a little slower, but at least the miniport driver will be able to continue indicating NBLs to higher level drivers. Source: NdisMIndicateReceiveNetBufferLists function (ndis.h)

2.3 ZeroTier NDIS Driver Analysis

Looking at constants.h we see 3 variables that could represent the number of NBLs allocated by the ZeroTier TAP driver. The first is set to 64, and the other two are set to 16. However, I couldn't find these variables used in any meaningful way inside the other .c code files. They mostly appeared in comments. So I'm unsure whether these are implemented or not.

The NdisMIndicateReceiveNetBufferLists function is called three times in rxpath.c (Line 149 and Line 496 and Line 606). The first time it's called, there's a possibility the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL flag could be applied, but I couldn't find a code path that ever leads to NDIS_RECEIVE_FLAGS_RESOURCES being set. The last two times it's called, the ReceiveFlags parameter is set directly to 0.

Therefore, it looks like the ZeroTier TAP driver will always rely on the higher level NDIS driver to return NBLs in a timely fashion, with no fall-back mode of operation for cases where it runs out of NBLs. Also, I was unable to confirm (mainly due to my ineptitude with NDIS drivers) that the TAP driver is actually honoring the number of concurrent NBLs listed in constants.h.

2.4 Theory of Malfunction

The following diagram depicts why I think unreachable ARPs are causing issues. Please read it from top to bottom. Based on the ZeroTier driver source code, and what I've read from Microsoft, I'm somewhat confident the first 2 "NDIS" columns are correct. I'm very confident the last column (Windows Kernel/ARP Logic) is correct. Windows is closed source, so the "RRAS Driver" column is an educated guess, based on my observations and networking knowledge.

image

If this is correct, then any delay in any higher level NDIS drivers could cause ZeroTier NIC latency/outages. In that case, this issue would not just be limited to ARPs on an RRAS server. It could surface anywhere a higher level NDIS driver is taking it's sweet time with something.

3. Fix Recommendations

I'm proposing two courses of action, with the goal of improving ZeroTier network resiliency in situations when NBLs are not handled immediately.

  1. Recommend validating the logic that allows the ZeroTier NDIS driver to send concurrent NBLs to higher-level NDIS drivers. Some comments in the source code tend to refer to this as "in-flight NBLs." As seen in the Intel NIC driver pictured above, exposing this setting as a configurable option in the device's Advanced tab could be a good way to keep the number low for average users, while allowing the number to be cranked up for high performance computer situations.
  2. Recommend adding in logic to detect when there is only 1 spare NBL left. When the last spare NBL is indicated, add NDIS_RECEIVE_FLAGS_RESOURCES to the ReceiveFlags parameter. This will force windows to copy the NBL's contents, and return it immediately, so the ZeroTier driver can continue to pass traffic.

4. Work-Arounds

Until a true fix can be implemented, there's a few ways we can try to circumvent the underlying issue:

4.1 Add another hop between RRAS and the Internal/Physical network

The ZeroTier NIC driver is unable to handle the long-running ARP requests performed by Windows. But Windows only needs to perform ARPs for IPs residing on networks that it's directly connected to. So if you insert a /30 or /31 routing network between Windows and your Internal/Office network, then Windows will have no need to perform an ARP, because none of the destination IPs are local. This solution will be more effective than the work-around listed in 4.2, but it has the draw-back of requiring you to change the RRAS server's internal/office IP. You'll also need to make some physical cabling changes to your network, and re-work some of your static routes.

Current Network: image

Work-Around Network: image

4.2 Get Windows to reduce the number of ARPs it does

While the solution about in Section 4.1 is going to be more stable, it's not always an option for some people. For me, I'm unable to change the server's internal IP address, and making physical cabling changes isn't easy. This leaves us trying to get Windows to not be so noisy when performing ARPs on our internal network. I tried all sort of NIC and reg settings (ARP Offload, ArpRetryCount, EnableDeadGWDetect, etc...), restarting along the way, but nothing had any effect. However, I noticed that when pinging from internal to an unreachable ZeroTier IP, Windows would only issue 1 ARP request every 61 seconds. After much trial and error, I determined that setting a static route on the ZeroTier NIC (apparently the default gateway/route isn't good enough) will get Windows to chill out on it's spastic ARP requests. Now a normal person would think that to change ARP behavior on the Internal NIC, you would need to set the static route on the Internal NIC... But no, it only works when you set it on the ZeroTier NIC. All I can say is "That's Windows for ya!"

After further testing, I came up with a list rules for the static route, in order to trigger the desired behavior:

I didn't want to setup another device on ZeroTier for the Next Hop IP, because if it fails for some reason, then my network will go down within a matter of seconds. Instead, since the ZeroTier default gateway is already in the Windows ARP cache, we can use that as the next hop IP. So here's the workaround I'm currently using:

4.2.1 Add this static route to the Windows Router device

I do this through RRAS, but you could also do it on the command line with a persistent route. The nice thing about doing it with RRAS, is the route only exists if the RRAS service is running.

Applying this workaround is super simple, but it comes at the cost of not being as effective. Windows is still going to perform ARPs, and my testing has shown that ARPs for UDP traffic don't follow the 61 second cool-down rule that ARPs for ICMP and TCP traffic do. Here's a table showing what kind of improvements you can expect from this workaround:

image

In my case, I still have a lot of SNMP traffic to deal with from long forgotten printers. At times, all that unreachable UDP traffic can still overwhelm the ZeroTier network driver, to the point where the network will drop for a few minutes. Until I can get around to removing those stale printer entries, I've had to block SNMP traffic using the Flow Rules in ZeroTier Central:

image

5. Scope of Impact

Thought it might be worth mentioning that several other people are having similar issues (...and as a cheap plug to try and garner support for this issue, haha!).

5.1 ZeroTier network issues with Windows RRAS:

5.2 Intermittent ZeroTier network issues with SSDP/UPnP:

I can't 100% confirm these situations are related, but I also saw SSDP (port 1900) and WS-Discovery traffic on my network that correlated with some ZeroTier network degradation. After implementing the workaround I listed in Section 4.2 above, I removed the UDP 1900 block from my flow control rules, and haven't had any problems since. Could be coincidence, but I suspect it's because the UPnP Device Host service is getting hung up with the non-standard routing scenarios, and is taking too long to respond.

joseph-henry commented 3 years ago

@JonathonFS this is the most beautiful ticket we've ever seen. Thank you for putting this together. Please give us some time digest and test this.

JonathonFS commented 3 years ago

@joseph-henry you're most welcome! Really love this product, and what you guys are doing with it.

Replicating the Issue on Windows 10 instead of Server

My initial post showed how to replicate the issue on a Windows Server OS, but I realize this may be hard to come by for some people. The following guide shows how to setup routing on Windows 10 Pro or Enterprise, and replicate the same issue.

1 Prerequisites

Identify a device to be used as the Windows ZT Router:

Identify a device to be used as the Remote ZT Client:

2 ZeroTier Network Setup

  1. Create a new ZeroTier network
    1. Under IPv4 Auto-Assign, enable Auto-Assign from Range and select Easy
    2. Select a /24 network. This demo will use **10.147.17.***
  2. Download and install ZeroTier on the Windows ZT Router. This demo is using version 1.6.5.
    1. Click the Windows Start button, type in "zero" and launch the ZeroTier One tray app.
    2. Right click the ZeroTier One tray icon, and select Join Network...
    3. Enter the network ID of the ZeroTier network created above.
  3. Download and install ZeroTier on the Remote ZT Client.
    1. This demo uses version 1.6.5, but any version that supports managed routes should work.
    2. Join the ZeroTier network created above.
  4. Authorize the new members to the ZeroTier network.
    1. Go back in the ZeroTier Central console
    2. Scroll down to the Members section, and authorize the Windows ZT Router and Remote ZT Client
    3. Wait for them to join the network, and ensure an IP appears in the Managed IPs column (could take 5 - 30 sec).

2.1 Verify ZeroTier Connectivity

  1. From the Windows ZT Router, verify that you can ping the Remote ZT Client (10.147.17.79).
  2. From the Remote ZT Client, verify the you can ping the Windows ZT Router (10.147.17.31).
  3. If a ping fails, check the host firewall on the device you are unable to ping.
    1. In my demo environment, the Windows firewall on the Windows ZT Router had to be modified (or disabled).
      1. Open Windows Firewall (wf.msc)
      2. Click Inbound Rules and add a new rule
        1. Rule Type: Custom
        2. Program: All Programs
        3. Protocol Type: ICMPv4
        4. Remote IP Address: 10.147.17.0/24
        5. Action: Allow
        6. Profiles: All
        7. Name: All Ping from ZT Network

3 Enable Routing

The section will show how to route (forward) traffic between the physical (192.168.200.0/24) and ZeroTier (10.147.17.0/24) networks.

3.1 Windows Router Config

  1. On the Windows ZT Router, consider disabling Third-Party host firewalls.
    1. I was running Bitdefender, and having the firewall enabled would prevent packets from routing. This kind of makes sense, since packet routing is more of a Windows Server feature, and the Bitdefender firewall isn't supported on Windows Server.
    2. If you disable a third party firewall, the Windows firewall will usually turn on automatically. To ensure the Windows firewall is working correctly, your should perform the steps in the Verify ZeroTier Connectivity section above again.
  2. On the Windows ZT Router, open Services (services.msc)
    1. Find and edit the Routing and Remote Access service.
    2. Change the startup type to Automatic (Delayed Start), and start the service. A restart was not necessary in my testing.

      NOTE: Many articles point to the use of the IPEnableRouter registry key to enable Windows packet forwarding. This may be true on older versions of Windows, but in my testing on Windows 10 Pro 20H2, this registry key had no effect.

3.2 ZeroTier Central Routing Config

  1. In the ZeroTier Central console, locate the Managed Routes section and add a new route:
    1. Destination: Set this to the physical subnet of the Windows ZT Router.
      1. This demo uses 192.168.200.0/24
    2. (Via): Set this to the ZeroTier Managed IP of the Windows ZT Router. This IP is listed in the Members section below.
      1. This demo uses 10.147.17.31

3.3 (Optional) Physical Network Routing Config

This step is optional, and is not required to replicate the issue below. If you want other devices on the Windows ZT Router's physical network (192.168.200.0/24) to be able to communicate with devices on the ZeroTier network (10.147.17.0/24), then a static route must be configured on the physical network's default gateway. This results in a more "real-world" demo environment.

  1. Log in to the device acting as the network's default Gateway. In a home environment, this is probably your Wireless router.
  2. Create a static route:
    1. Set the destination network to the ZeroTier network. This demo would use 10.147.17.0/24
    2. Set the target/gateway/next hop IP to the Windows ZT Router's physical IP. This demo would use 192.168.200.173

3.4 Verify ZeroTier Routing

  1. From the Windows ZT Router's physical interface, verify that you can ping the Remote ZT Client.
    1. ping -S 192.168.200.173 10.147.17.79
  2. From the Remote ZT Client, verify the you can ping the Windows ZT Router's physical IP.
    1. ping 192.168.200.173
  3. If both pings work, then you're ready to begin testing.

4 Replicating the Issue

The first thing we need to do is find an unused IP on the physical network connected to the Windows ZT Router. This demo will use 192.168.200.214.

  1. Attempt to ping the unused IP from the Windows Router.
  2. Verify the ping fails.
  3. From the Windows ZT Router command line, run arp -a and verify the unused IP is NOT listed in the ARP cache.

4.1 Reproducing NIC Latency

  1. From the Windows ZT Router, setup a recurring ping to the Remote ZT Client. For example: ping -t 10.147.17.79
    1. This ping let's us monitor how well the ZeroTier NIC is passing traffic on the Windows ZT Router.
  2. From the Remote ZT Client, setup a recurring ping to the unused IP. For example: ping -t 192.168.200.214
  3. Note that every time the Remote ZT Client pings, the recurring ping on the Windows ZT Router will incur a ~2000 ms delay. Win 10 Ping Latency

4.2 Reproducing NIC Packet Loss

  1. From the Windows ZT Router, setup a recurring ping to the Remote ZT Client. For example: ping -t 10.147.17.79
  2. From the Remote ZT Client, setup 4 concurrent recurring pings to the unused IP.
    1. On Windows, this can be done by opening 4 command prompts, and running the ping -t 192.168.200.214 command in each of them.
  3. Note that the recurring pings on the Windows ZT Router will start failing completely. Win 10 Ping Failure
  4. If the 4 concurrent pings to the unreachable IP are cancelled, then the ping traffic on the Windows ZT Router will recover after a minute or so. Win 10 Ping Recovery