google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.88k stars 1.3k forks source link

netstack: performance w/TCP-RACK on Windows #9778

Open jwhited opened 1 year ago

jwhited commented 1 year ago

Description

Our usage of netstack within tailscale performs poorly on Windows with the following stack settings:

Using Stack.AddTCProbe() to print congestion window (in packets) shows the window being held below 10 packets during a throughput test:

    var lastDebug time.Time
    ipstack.AddTCPProbe(func(s *stack.TCPEndpointState) {
        now := time.Now()
        if now.After(lastDebug.Add(time.Second)) {
            logf("%s:%d => %s:%d cwnd in packets: %d", s.ID.LocalAddress.String(), s.ID.LocalPort, s.ID.RemoteAddress.String(), s.ID.RemotePort, s.Sender.SndCwnd)
            lastDebug = now
        }
    })
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 7
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64348 cwnd in packets: 9
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64347 cwnd in packets: 5
2023/11/29 18:58:25 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 5
2023/11/29 18:58:26 100.78.224.154:80 => 100.90.1.8:64351 cwnd in packets: 8
2023/11/29 18:58:26 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 7
2023/11/29 18:58:27 100.78.224.154:80 => 100.90.1.8:64350 cwnd in packets: 9

Throughput is poor (8Mb/s). Changing TCP loss recovery to 0 (no TCP-RACK) results in significantly improved throughput by a factor of ~10 (8Mb/s => 80Mb/s). Congestion window moves in a more expected fashion. Path under test is not particularly lossy.

Linux does not exhibit the same behavior/issue. This appears to be Windows-specific. Reproduced by multiple users in multiple environments across Windows 11 and Windows Server 2022.

Originally reported via https://github.com/tailscale/tailscale/issues/9707

Steps to reproduce

https://github.com/tailscale/tailscale/issues/9707#issuecomment-1752175564 describes steps to reproduce using tailscale. We have since changed loss recovery on Windows as a workaround via https://github.com/tailscale/tailscale/commit/5e861c38718ffcde3ded6d2922ca464886e41321.

Reproduced at both gVisor HEAD (4b4191b8cad1f5f1a99be76d8dae59b713e58ff5) and what tailscale is currently using (4fe30062272c)

runsc version

No response

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

kevinGC commented 1 year ago

Adding @nybidari, who knows more about RACK.

Interesting that this is Windows-only, as I wouldn't expect that to matter. Maybe something to do with timers (since RACK is time-based) is OS-dependent?

jwhited commented 1 year ago

Adding @nybidari, who knows more about RACK.

Interesting that this is Windows-only, as I wouldn't expect that to matter. Maybe something to do with timers (since RACK is time-based) is OS-dependent?

FWIW I have tested with higher resolution timing, but found no difference in the results:

err := windows.TimeBeginPeriod(1)
if err != nil {
    panic(err)
}

I just now realized that tcpip.TCPRACKStaticReoWnd and tcpip.TCPRACKNoDupTh are meant to mask on top of tcpip.TCPRACK, and they are unused anyway. So when I was using those values it was the same as no RACK. Removed that bit from the description.

nybidari commented 1 year ago

I don't think RACK does anything different on windows compared to other operating systems. From my understanding, RACK performance can be lower than other congestion control algorithms in these cases:

  1. Packets were reordered and RACK adjusts the reordering window. Lets say RACK detected a large reordering window. Now if the packets (after RACK adjusted the reordering window to a large value) were actually lost, then RACK waits till the reordering window timeout to detect the packet loss. To adjust the reorder window back to the initial value, RACK will wait for 16 loss recoveries. Other congestion control algorithms do not consider reordering at all and in this case they will enter only one loss recovery falsely.
  2. RTOs: I don't know how, but may be there are more RTOs with RACK on windows.

These are just my speculations, the root cause can be something else also! To debug further, would it be possible to get these TCP stats for with and without RACK on windows: https://github.com/google/gvisor/blob/master/pkg/tcpip/tcpip.go#L2123-L2146 ?

jwhited commented 1 year ago

To debug further, would it be possible to get these TCP stats for with and without RACK on windows: https://github.com/google/gvisor/blob/master/pkg/tcpip/tcpip.go#L2123-L2146 ?

30 second throughput test

Windows Server 2022 No TCP-RACK ~80Mb/s:

2023/11/30 00:57:00 Retransmits: 3299 FastRecovery: 0 SACKRecovery: 52 TLPRecovery: 0 SlowStartRetransmits: 1653 FastRetransmit: 52 Timeouts: 10

Windows Server 2022 TCP-RACK ~8Mb/s:

2023/11/30 00:59:40 Retransmits: 1430 FastRecovery: 0 SACKRecovery: 690 TLPRecovery: 0 SlowStartRetransmits: 4 FastRetransmit: 687 Timeouts: 4

Ubuntu 22.04 No TCP-RACK ~90Mb/s:

2023/11/30 01:05:31 Retransmits: 4251 FastRecovery: 0 SACKRecovery: 66 TLPRecovery: 0 SlowStartRetransmits: 2690 FastRetransmit: 66 Timeouts: 15

Ubuntu 22.04 TCP-RACK ~80Mb/s:

2023/11/30 01:03:07 Retransmits: 2220 FastRecovery: 0 SACKRecovery: 64 TLPRecovery: 0 SlowStartRetransmits: 3 FastRetransmit: 64 Timeouts: 1
github-actions[bot] commented 8 months ago

A friendly reminder that this issue had no activity for 120 days.

jwhited commented 6 months ago

@nybidari any findings? Any reason to believe a more recent release would improve RACK on windows?

kevinGC commented 5 months ago

No findings, and no features have targeted this specifically. Wish we had more bandwidth to investigate.

github-actions[bot] commented 1 month ago

A friendly reminder that this issue had no activity for 120 days.