Closed puellanivis closed 1 week ago
Compared to current dev-v2
:
goos: linux
goarch: amd64
pkg: github.com/pkg/sftp/v2/localfs
cpu: 13th Gen Intel(R) Core(TM) i9-13900KF
│ old.log │ new.log │
│ sec/op │ sec/op vs base │
WriteTo1KiB-32 123.3µ ± 5% 118.0µ ± 2% -4.25% (p=0.000 n=10)
WriteTo1MiB-32 333.6µ ± 1% 327.1µ ± 2% -1.96% (p=0.011 n=10)
WriteTo4MiB-32 941.8µ ± 7% 881.2µ ± 1% -6.44% (p=0.043 n=10)
WriteTo10MiB-32 2.592m ± 5% 2.466m ± 4% -4.86% (p=0.002 n=10)
WriteTo64MiB-32 17.66m ± 2% 17.23m ± 4% -2.46% (p=0.023 n=10)
ReadFrom1KiB-32 205.3µ ± 26% 185.9µ ± 17% ~ (p=0.631 n=10)
ReadFrom1MiB-32 968.1µ ± 3% 949.0µ ± 2% ~ (p=0.063 n=10)
ReadFrom4MiB-32 3.302m ± 2% 3.280m ± 1% ~ (p=0.105 n=10)
ReadFrom10MiB-32 8.066m ± 5% 7.989m ± 3% ~ (p=0.123 n=10)
ReadFrom64MiB-32 42.75m ± 2% 41.73m ± 3% ~ (p=0.063 n=10)
geomean 1.821m 1.756m -3.57%
│ old.log │ new.log │
│ B/s │ B/s vs base │
WriteTo1KiB-32 7.920Mi ± 5% 8.273Mi ± 2% +4.46% (p=0.000 n=10)
WriteTo1MiB-32 2.927Gi ± 1% 2.986Gi ± 2% +2.00% (p=0.011 n=10)
WriteTo4MiB-32 4.148Gi ± 8% 4.433Gi ± 1% +6.87% (p=0.043 n=10)
WriteTo10MiB-32 3.769Gi ± 5% 3.960Gi ± 4% +5.07% (p=0.002 n=10)
WriteTo64MiB-32 3.539Gi ± 2% 3.628Gi ± 4% +2.52% (p=0.023 n=10)
ReadFrom1KiB-32 4.759Mi ± 35% 5.260Mi ± 17% ~ (p=0.617 n=10)
ReadFrom1MiB-32 1.009Gi ± 3% 1.029Gi ± 2% ~ (p=0.063 n=10)
ReadFrom4MiB-32 1.183Gi ± 2% 1.191Gi ± 2% ~ (p=0.105 n=10)
ReadFrom10MiB-32 1.211Gi ± 5% 1.222Gi ± 3% ~ (p=0.123 n=10)
ReadFrom64MiB-32 1.462Gi ± 2% 1.498Gi ± 3% ~ (p=0.063 n=10)
geomean 659.6Mi 684.1Mi +3.72%
│ old.log │ new.log │
│ B/op │ B/op vs base │
WriteTo1KiB-32 8.157Ki ± 1% 8.283Ki ± 1% +1.54% (p=0.005 n=10)
WriteTo1MiB-32 42.36Ki ± 0% 42.68Ki ± 1% +0.75% (p=0.009 n=10)
WriteTo4MiB-32 43.73Ki ± 2% 43.66Ki ± 2% ~ (p=0.631 n=10)
WriteTo10MiB-32 46.59Ki ± 3% 46.18Ki ± 5% ~ (p=0.971 n=10)
WriteTo64MiB-32 55.25Ki ± 31% 62.49Ki ± 17% ~ (p=0.218 n=10)
ReadFrom1KiB-32 3.578Ki ± 0% 3.734Ki ± 0% +4.37% (p=0.000 n=10)
ReadFrom1MiB-32 3.821Ki ± 0% 3.977Ki ± 0% +4.06% (p=0.000 n=10)
ReadFrom4MiB-32 30.95Ki ± 4% 30.34Ki ± 5% ~ (p=0.393 n=10)
ReadFrom10MiB-32 46.45Ki ± 4% 47.24Ki ± 2% ~ (p=0.256 n=10)
ReadFrom64MiB-32 65.09Ki ± 2% 65.27Ki ± 2% ~ (p=0.123 n=10)
geomean 23.43Ki 23.94Ki +2.20%
│ old.log │ new.log │
│ allocs/op │ allocs/op vs base │
WriteTo1KiB-32 122.0 ± 1% 125.0 ± 0% +2.46% (p=0.000 n=10)
WriteTo1MiB-32 134.0 ± 1% 135.0 ± 1% +0.75% (p=0.000 n=10)
WriteTo4MiB-32 230.0 ± 0% 231.5 ± 0% +0.65% (p=0.000 n=10)
WriteTo10MiB-32 428.5 ± 0% 430.0 ± 0% +0.35% (p=0.000 n=10)
WriteTo64MiB-32 2.162k ± 0% 2.164k ± 0% +0.09% (p=0.000 n=10)
ReadFrom1KiB-32 39.00 ± 0% 41.00 ± 0% +5.13% (p=0.000 n=10)
ReadFrom1MiB-32 70.00 ± 0% 72.00 ± 0% +2.86% (p=0.000 n=10)
ReadFrom4MiB-32 215.5 ± 0% 217.0 ± 0% +0.70% (p=0.001 n=10)
ReadFrom10MiB-32 423.5 ± 1% 426.0 ± 0% +0.59% (p=0.004 n=10)
ReadFrom64MiB-32 2.168k ± 0% 2.171k ± 0% +0.14% (p=0.000 n=10)
geomean 267.6 271.2 +1.36%
Windows shows similar improvements:
goos: windows
goarch: amd64
pkg: github.com/pkg/sftp/v2/localfs
cpu: 13th Gen Intel(R) Core(TM) i9-13900KF
│ old.log │ new.log │
│ sec/op │ sec/op vs base │
WriteTo1KiB-32 284.2µ ± 3% 277.3µ ± 1% -2.42% (p=0.015 n=10)
WriteTo1MiB-32 964.0µ ± 1% 936.2µ ± 1% -2.88% (p=0.000 n=10)
WriteTo4MiB-32 2.954m ± 6% 2.876m ± 2% -2.62% (p=0.000 n=10)
WriteTo10MiB-32 7.163m ± 2% 6.954m ± 2% -2.92% (p=0.000 n=10)
WriteTo64MiB-32 45.01m ± 1% 43.22m ± 1% -3.97% (p=0.000 n=10)
ReadFrom1KiB-32 262.9µ ± 3% 254.6µ ± 3% -3.15% (p=0.015 n=10)
ReadFrom1MiB-32 4.815m ± 7% 3.946m ± 6% -18.04% (p=0.000 n=10)
ReadFrom4MiB-32 3.755m ± 7% 3.634m ± 10% ~ (p=0.105 n=10)
ReadFrom10MiB-32 7.562m ± 5% 7.401m ± 9% ~ (p=0.089 n=10)
ReadFrom64MiB-32 47.42m ± 8% 48.01m ± 8% ~ (p=0.739 n=10)
geomean 3.671m 3.519m -4.14%
│ old.log │ new.log │
│ B/s │ B/s vs base │
WriteTo1KiB-32 3.438Mi ± 3% 3.519Mi ± 1% +2.36% (p=0.014 n=10)
WriteTo1MiB-32 1.013Gi ± 1% 1.043Gi ± 1% +2.97% (p=0.000 n=10)
WriteTo4MiB-32 1.323Gi ± 5% 1.358Gi ± 2% +2.69% (p=0.000 n=10)
WriteTo10MiB-32 1.363Gi ± 2% 1.404Gi ± 2% +3.00% (p=0.000 n=10)
WriteTo64MiB-32 1.389Gi ± 1% 1.446Gi ± 1% +4.14% (p=0.000 n=10)
ReadFrom1KiB-32 3.715Mi ± 3% 3.834Mi ± 2% +3.21% (p=0.011 n=10)
ReadFrom1MiB-32 207.7Mi ± 7% 253.4Mi ± 6% +22.01% (p=0.000 n=10)
ReadFrom4MiB-32 1.041Gi ± 7% 1.075Gi ± 11% ~ (p=0.105 n=10)
ReadFrom10MiB-32 1.291Gi ± 5% 1.319Gi ± 10% ~ (p=0.089 n=10)
ReadFrom64MiB-32 1.318Gi ± 8% 1.302Gi ± 7% ~ (p=0.739 n=10)
geomean 327.2Mi 341.3Mi +4.30%
│ old.log │ new.log │
│ B/op │ B/op vs base │
WriteTo1KiB-32 15.75Ki ± 2% 16.02Ki ± 2% ~ (p=0.065 n=10)
WriteTo1MiB-32 41.66Ki ± 0% 42.15Ki ± 1% +1.17% (p=0.015 n=10)
WriteTo4MiB-32 42.76Ki ± 3% 43.31Ki ± 3% ~ (p=0.353 n=10)
WriteTo10MiB-32 44.68Ki ± 4% 44.45Ki ± 5% ~ (p=0.971 n=10)
WriteTo64MiB-32 55.40Ki ± 24% 53.43Ki ± 17% ~ (p=0.853 n=10)
ReadFrom1KiB-32 4.375Ki ± 0% 4.531Ki ± 0% +3.57% (p=0.000 n=10)
ReadFrom1MiB-32 4.694Ki ± 0% 4.826Ki ± 0% +2.80% (p=0.000 n=10)
ReadFrom4MiB-32 7.826Ki ± 12% 9.029Ki ± 12% +15.37% (p=0.001 n=10)
ReadFrom10MiB-32 15.30Ki ± 19% 19.10Ki ± 12% +24.87% (p=0.000 n=10)
ReadFrom64MiB-32 52.15Ki ± 10% 61.67Ki ± 8% +18.25% (p=0.000 n=10)
geomean 19.72Ki 20.93Ki +6.14%
│ old.log │ new.log │
│ allocs/op │ allocs/op vs base │
WriteTo1KiB-32 94.00 ± 1% 95.00 ± 0% +1.06% (p=0.001 n=10)
WriteTo1MiB-32 142.0 ± 0% 144.0 ± 0% +1.41% (p=0.000 n=10)
WriteTo4MiB-32 238.0 ± 0% 241.0 ± 0% +1.26% (p=0.000 n=10)
WriteTo10MiB-32 431.0 ± 0% 433.5 ± 0% +0.58% (p=0.000 n=10)
WriteTo64MiB-32 2.167k ± 0% 2.170k ± 0% +0.12% (p=0.000 n=10)
ReadFrom1KiB-32 41.00 ± 0% 43.00 ± 0% +4.88% (p=0.000 n=10)
ReadFrom1MiB-32 73.00 ± 1% 74.00 ± 0% +1.37% (p=0.000 n=10)
ReadFrom4MiB-32 177.0 ± 1% 181.0 ± 1% +2.26% (p=0.000 n=10)
ReadFrom10MiB-32 383.5 ± 1% 391.0 ± 1% +1.96% (p=0.000 n=10)
ReadFrom64MiB-32 2.157k ± 0% 2.165k ± 0% +0.37% (p=0.000 n=10)
geomean 257.9 261.8 +1.52%
(This is the same machine WSL vs native windows running from WSL… and the tempdir for windows is actually on a windows drive, so it’s not like it’s having to translate from a Windows //WSL$/...
path to access the test file… and, queue the intro to flame war about how bad Windows performance is.)
Performance on macOS ARM is pretty much equivalent:
goos: darwin
goarch: arm64
pkg: github.com/pkg/sftp/v2/localfs
cpu: Apple M1 Pro
│ old.log │ new.log │
│ sec/op │ sec/op vs base │
WriteTo1KiB-10 390.4µ ± 2% 394.3µ ± 3% ~ (p=0.280 n=10)
WriteTo1MiB-10 773.7µ ± 3% 780.7µ ± 2% ~ (p=0.315 n=10)
WriteTo4MiB-10 1.760m ± 2% 1.790m ± 3% ~ (p=0.190 n=10)
WriteTo10MiB-10 3.770m ± 1% 3.831m ± 3% +1.60% (p=0.001 n=10)
WriteTo64MiB-10 21.19m ± 3% 21.00m ± 2% ~ (p=0.218 n=10)
ReadFrom1KiB-10 141.0µ ± 2% 142.9µ ± 2% ~ (p=0.353 n=10)
ReadFrom1MiB-10 715.1µ ± 1% 709.6µ ± 2% ~ (p=0.315 n=10)
ReadFrom4MiB-10 2.474m ± 1% 2.495m ± 1% ~ (p=0.165 n=10)
ReadFrom10MiB-10 5.110m ± 1% 5.131m ± 2% ~ (p=0.739 n=10)
ReadFrom64MiB-10 33.35m ± 5% 32.78m ± 4% ~ (p=0.315 n=10)
geomean 2.117m 2.126m +0.43%
│ old.log │ new.log │
│ B/s │ B/s vs base │
WriteTo1KiB-10 2.503Mi ± 2% 2.480Mi ± 3% ~ (p=0.224 n=10)
WriteTo1MiB-10 1.262Gi ± 3% 1.251Gi ± 2% ~ (p=0.315 n=10)
WriteTo4MiB-10 2.219Gi ± 2% 2.183Gi ± 3% ~ (p=0.190 n=10)
WriteTo10MiB-10 2.590Gi ± 1% 2.549Gi ± 3% -1.58% (p=0.001 n=10)
WriteTo64MiB-10 2.949Gi ± 3% 2.976Gi ± 2% ~ (p=0.218 n=10)
ReadFrom1KiB-10 6.928Mi ± 2% 6.838Mi ± 2% ~ (p=0.323 n=10)
ReadFrom1MiB-10 1.366Gi ± 1% 1.376Gi ± 2% ~ (p=0.315 n=10)
ReadFrom4MiB-10 1.579Gi ± 1% 1.566Gi ± 1% ~ (p=0.165 n=10)
ReadFrom10MiB-10 1.911Gi ± 1% 1.903Gi ± 3% ~ (p=0.739 n=10)
ReadFrom64MiB-10 1.874Gi ± 5% 1.907Gi ± 4% ~ (p=0.315 n=10)
geomean 567.5Mi 565.1Mi -0.42%
│ old.log │ new.log │
│ B/op │ B/op vs base │
WriteTo1KiB-10 12.00Ki ± 4% 12.52Ki ± 2% +4.35% (p=0.004 n=10)
WriteTo1MiB-10 41.48Ki ± 1% 42.02Ki ± 2% ~ (p=0.063 n=10)
WriteTo4MiB-10 42.31Ki ± 3% 42.38Ki ± 2% ~ (p=0.971 n=10)
WriteTo10MiB-10 43.62Ki ± 7% 43.88Ki ± 6% ~ (p=0.529 n=10)
WriteTo64MiB-10 56.01Ki ± 13% 56.65Ki ± 24% ~ (p=0.481 n=10)
ReadFrom1KiB-10 3.672Ki ± 0% 3.828Ki ± 0% +4.26% (p=0.000 n=10)
ReadFrom1MiB-10 3.926Ki ± 0% 4.070Ki ± 0% +3.67% (p=0.000 n=10)
ReadFrom4MiB-10 5.093Ki ± 6% 5.019Ki ± 5% ~ (p=0.724 n=10)
ReadFrom10MiB-10 7.941Ki ± 7% 8.606Ki ± 7% +8.37% (p=0.015 n=10)
ReadFrom64MiB-10 27.58Ki ± 15% 28.09Ki ± 8% ~ (p=0.436 n=10)
geomean 15.55Ki 15.92Ki +2.39%
│ old.log │ new.log │
│ allocs/op │ allocs/op vs base │
WriteTo1KiB-10 108.0 ± 1% 109.5 ± 0% +1.39% (p=0.000 n=10)
WriteTo1MiB-10 139.0 ± 0% 141.0 ± 0% +1.44% (p=0.000 n=10)
WriteTo4MiB-10 235.0 ± 0% 237.5 ± 0% +1.06% (p=0.000 n=10)
WriteTo10MiB-10 429.0 ± 0% 431.0 ± 0% +0.47% (p=0.000 n=10)
WriteTo64MiB-10 2.161k ± 0% 2.163k ± 0% +0.09% (p=0.000 n=10)
ReadFrom1KiB-10 39.00 ± 0% 41.00 ± 0% +5.13% (p=0.000 n=10)
ReadFrom1MiB-10 70.00 ± 0% 72.00 ± 0% +2.86% (p=0.000 n=10)
ReadFrom4MiB-10 171.0 ± 2% 169.5 ± 3% ~ (p=0.992 n=10)
ReadFrom10MiB-10 372.0 ± 0% 375.0 ± 1% +0.81% (p=0.000 n=10)
ReadFrom64MiB-10 2.123k ± 0% 2.123k ± 0% ~ (p=0.641 n=10)
geomean 255.9 259.1 +1.22%
There are a few fixes mixed in with solely addressing the deadlock. I still need to test this, but I needed a branch to test.
The change has to plumb a channel through to
clientConn
where it it performs the actual writing of the request to the outbound writer. As a result, I’m unsure how feasible it would be to get the same sort of logic rolled out on v1. I know where I would start, and that’s the new timebox for after when I test this one. If it becomes a giant mess of tangles, I’ll back out, and we’ll accept that the previous behavior “just happened to work, but was never guaranteed to work.”fixes: #603