zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.3k stars 1.67k forks source link

Poor Multipath TCP throughput #1734

Open grapexy opened 2 years ago

grapexy commented 2 years ago

I have Many-to-One multipath configured with 2 WAN links on site A and 1 WAN link on site B.

Both of site A links are symmetric 100 Mbps. Site B link is 1 Gbps.

Site B is configured as a default gateway for all Site A oubound connections, and the only NAT is ZT > WAN. There is no NAT for the tunnel.

Latency from Site A to Site B is ~40ms and both are using ZT version 1.10.1.

With balance-xor or balance-aware on both sides, I'm able to get 200 Mbps with iperf3 UDP:

However, iperf3 TCP (10 parallel connections) with same settings is extremely slow:

Testing a single stream wget download is better, but still unable to achieve even a single link speed's throughput:

Site A local.conf:

{
    "physical": {
        "192.168.0.0/16": {
            "blacklist": true
        }
    },
    "settings": {
        "allowSecondaryPort": false,
        "interfacePrefixBlacklist": [
            "wg",
            "zt"
        ],
        "allowTcpFallbackRelay": false,
        "portMappingEnabled": false,
        "defaultBondingPolicy": "custom-balance-xor",
        "policies": {
            "custom-balance-aware": {
                "basePolicy": "balance-aware",
                "rebalanceStrategy": "aggressive",
                "balancePolicy": "flow-dynamic",
                "links": {
                    "igb1": {},
                    "igb2": {}
                }
            },
            "custom-balance-xor": {
                "basePolicy": "balance-xor",
                "rebalanceStrategy": "aggressive",
                "links": {
                    "igb1": {},
                    "igb2": {}
                }
            }
        }
    }
}

Site B local.conf:

{
    "physical": {},
    "settings": {
        "allowSecondaryPort": false,
        "interfacePrefixBlacklist": [
            "wg"
        ],
        "portMappingEnabled": false,
        "allowTcpFallbackRelay": false,
        "defaultBondingPolicy": "custom-balance-xor",
        "policies": {
            "custom-balance-aware": {
                "basePolicy": "balance-aware",
                "balancePolicy": "flow-dynamic",
                "rebalanceStrategy": "aggressive"
            },
            "custom-balance-xor": {
                "basePolicy": "balance-xor",
                "rebalanceStrategy": "aggressive"
            },
            "custom-active-backup": {
                "failoverInterval": 1000,
                "basePolicy": "active-backup"
            },
            "custom-broadcast": {
                "dedup": true,
                "basePolicy": "broadcast"
            }
        }
    }
}

OPNsense on both ends, however, I've tried OpenWRT on Site A and got same results.

I've also tried adding two more links of dynamic speed (LTE) and still got exact same results.

bond show command displays 2 links being used, 9993 port is open on both ends and trace logs show that all links are being used.

Just as a note, I was previously using MPTCP (openmptcprouter) and was able to achieve 200 Mbps with default configuration and up to 400 Mbps with 2 additional dynamic links with some tinkering and was hoping ZT would be able to do roughly the same as ZT allows for far more advanced configurations. Are these results expected?

grapexy commented 2 years ago

Noticed this comment by @joseph-henry on a similar use case:

My use case involves having the edge router on a mobile platform, which means the signal strength (and bandwidth on my WAN interfaces on the edge), will vary depending on the location. Do you still recommend using balance-xor for this 2-to-1 setup (where the edge has 2 physicai WAN interfaces and the server has 1 physical WAN + 1 sub-interface)?

Yes I'd use the balance-xor for this. Our balance-aware mode needs some work before it would be useful in this case.

Is balance-aware supposed to work properly for many-to-one asymmetric, dynamic links now or does is still need more work? Also curious, what kind of work was/is required?

grapexy commented 2 years ago

After a few days of testing, it got better, not sure what changed however. Now I'm getting around 100-120 Mbps of aggregated bandwidth, but that's only marginally more than the speed of a single link.

I've also noticed that multiple paths are used for single stream TCP connections (e.g rsync or iperf3 without parallel connections). This is the case for balance-xor, or balance-aware with flow-dynamic or flow-static. tcpdump shows that zerotier is using both links with around 50/50 allocation with any of the settings. I was under the impression that hashing of src_port ^ dst_port ^ proto would result in only single path being used in this case.

Another observation is that CPU usage jumps from around 5-10% to about 60% as soon as any bonding policy is enabled.

grapexy commented 2 years ago

I've also noticed that multiple paths are used for single stream TCP connections (e.g rsync or iperf3 without parallel connections). This is the case for balance-xor, or balance-aware with flow-dynamic or flow-static. tcpdump shows that zerotier is using both links with around 50/50 allocation with any of the settings. I was under the impression that hashing of src_port ^ dst_port ^ proto would result in only single path being used in this case.

Found the culprit. If any custom policy is used, with a custom name, even with default settings, flow assignment does not happen (noticed that there were no assign in-flow debug messages in logs with custom policies). Changing defaultBondingPolicy to predefined policies properly assigns traffic to a single flow. However, this means that custom policies are effectively broken.

For example, the following fails to produce any assign in-flow etc. debug messages and flow hashing does not happen, which also results in same-hash flows going in and out through multiple links:

    "defaultBondingPolicy": "custom-balance-aware",
    "policies": {
        "custom-balance-aware": {
            "basePolicy": "balance-aware"
        }
    }

This however, works (produces logs, same-hash flows stay on single path):

    "defaultBondingPolicy": "balance-aware"

And as using default policy names for custom policies is an error condition (error: custom policy (balance-aware) will be ignored, cannot use standard policy names for custom policies), any custom configuration of policies is impossible. And docs should probably be updated to not use examples that result in this error.

joseph-henry commented 2 years ago

Thanks for reporting this. I'll take a peek today. Can you tell me which branch you're using? I often use a custom policy and don't see this issue.

I think there was a similar issue a while back but that was fixed.

grapexy commented 2 years ago

@joseph-henry both peers are on 1.10.1

grapexy commented 2 years ago

Saw the commit related to link selection and can say that this still persists on dev.

I'm guessing that flows don't go through the standard path selection in the bonding layer and get in and out on random paths, hence the missing "assign out-flow" "assign in-flow" debug messages whenever custom policy is activated. I don't know much C, so couldn't really find out why though.

grapexy commented 2 years ago

@joseph-henry I believe the issue here is that flow hashing is never enabled for custom policies that need it.

When bonds are initialized, _defaultPolicy is set to 0 for custom policies:

https://github.com/zerotier/ZeroTierOne/blob/04d1862e3ae0d916f78779a9fc0f058b25fd469d/node/Bond.hpp#L335-L353 https://github.com/zerotier/ZeroTierOne/blob/04d1862e3ae0d916f78779a9fc0f058b25fd469d/service/OneService.cpp#L2047-L2048

So when setBondParameters is called, _defaultPolicy and _policy are always evaluated to 0:

https://github.com/zerotier/ZeroTierOne/blob/04d1862e3ae0d916f78779a9fc0f058b25fd469d/node/Bond.cpp#L1695-L1700

And when the lines responsible for allowing flow-hashing are reached, we're evaluating defaults for ZT_BOND_POLICY_NONE (0), rather than a custom policy's base policy:

https://github.com/zerotier/ZeroTierOne/blob/04d1862e3ae0d916f78779a9fc0f058b25fd469d/node/Bond.cpp#L1758-L1780

Because the real policy is actually set down below, which does not do anything to set flow-hashing: https://github.com/zerotier/ZeroTierOne/blob/04d1862e3ae0d916f78779a9fc0f058b25fd469d/node/Bond.cpp#L1791-L1794