Degraded performance with elevated RTT - especially on windows

I've been playing a bit around with latency injection and measuring throughput. The setup is probably slightly broken and needs some more tuning, but it still already showed some surprising results.

Here's the measured throughput in the bulk benchmark for downloading 100MB of data when a given delay is injected for both directions (total RTT is 2 times that delay).

Delay	Windows	Linux
0ms	117MB/s	454MB/s
1ms	4MB/s	55MB/s
2ms	3.7MB/s	35MB/s
10ms	5.8MB/s	30MB/s
50ms	3.3MB/s	10MB/s
200ms	2.13MB/s	2.62MB/s

The variants with extra latency are not CPU bound - the library simply doesn't want to send data faster. If I run them for longer, the average throughput actually increases, which indicates the congestion controller is still raising the congestion window. This is also confirmed by stats.

E.g. for a 10ms delay

Delay	Windows 100MB	Windows 200MB	Linux 100MB	Linux 200MB
10ms	3.9MB/s	5.18MB/s	31MB/s	30MB/s

Changing congestion control to BBR makes it ramp up faster and get better numbers, but it still isn't great.

I'm not fully sure what causes the degradation even on Linux not the CPU bound, but on windows I noticed the following:

When 1ms latency (2ms RTT) is injected, the stats show a much higher RTT:

path: PathStats {
        rtt: 31.6284ms,
        cwnd: 945959,
        congestion_events: 11,
        lost_packets: 77,
        lost_bytes: 92400,
        sent_packets: 89928,
    },

So besides the 2ms latency we wanted to have, we actually get 30ms latency extra.

Compared on Linux:

path: PathStats {
        rtt: 4.368618ms,
        cwnd: 281805,
        congestion_events: 12,
        lost_packets: 275,
        lost_bytes: 330000,
        sent_packets: 90114,
    },

There's 2ms extra latency.

A bit more digging showed the extra latency is introduced by tokio timer precision (https://github.com/tokio-rs/tokio/issues/5021). That causes the network simulation to forward packets later than intended - which would be a simulation-only issue. However the library should still compensate for the higher RTT by trying to increase the congestion window even more. It seems like won't do that due to pacing: With pacing, the full congestion window isn't used at once - instead packets are sent out in 2ms intervals, and being spaced out by timers. When the associated timer makes 16ms out of that 2ms, most of the congestion window isn't used. And it might not even be increased due to being deemed app-limited (not sure).

I tried disabling pacing, and indeed it increases throughput

Delay	Windows	Linux (default socket buffers)	Linux (2MB socket buffers)
10ms	30MB/s	5.8MB/s	48.5 MB/s

So the lack of timer precision in combination with pacing indeed limits throughput. Since the simulation and the impact of timer precision on that one further impacts results, it would however be nice to verify this in a real deployment.

I assume in a real world deployment where the peer has good pacing and acknowledges packets more often, the difference would be less strong since the endpoint is also woken up by packets from the peer instead of just from timers.

quinn-rs / quinn

Degraded performance with elevated RTT - especially on windows #1409