Closed puellanivis closed 3 years ago
The allocations go from ~2390 to ~2290. Looks like sync.Pool
has a number of internal allocations that happen… might make sense for a longer-lived set of buffers, but for us, dumb is better.
Hi,
I am very busy for the next few weeks and don't have time to test, however this buffer pool could also replace the server side allocator
Impressive:
name old time/op new time/op delta
WriteTo1k-8 12.7ms ± 2% 12.0ms ± 3% -5.21% (p=0.000 n=10+9)
WriteTo16k-8 12.6ms ± 3% 12.0ms ± 3% -4.41% (p=0.000 n=10+10)
WriteTo32k-8 12.6ms ± 3% 12.1ms ± 4% -4.19% (p=0.000 n=10+10)
WriteTo128k-8 12.5ms ± 3% 12.1ms ± 2% -2.96% (p=0.003 n=10+10)
WriteTo512k-8 12.6ms ± 2% 12.1ms ± 1% -4.35% (p=0.000 n=10+8)
WriteTo1MiB-8 12.6ms ± 3% 12.1ms ± 3% -3.85% (p=0.000 n=10+10)
WriteTo4MiB-8 12.6ms ± 4% 12.1ms ± 3% -4.25% (p=0.000 n=10+9)
WriteTo4MiBDelay10Msec-8 109ms ± 1% 109ms ± 1% ~ (p=0.739 n=10+10)
WriteTo4MiBDelay50Msec-8 508ms ± 0% 509ms ± 0% ~ (p=0.133 n=9+10)
WriteTo4MiBDelay150Msec-8 1.51s ± 0% 1.51s ± 0% ~ (p=0.113 n=9+10)
name old speed new speed delta
WriteTo1k-8 828MB/s ± 2% 870MB/s ± 2% +5.10% (p=0.000 n=10+8)
WriteTo16k-8 833MB/s ± 3% 872MB/s ± 3% +4.61% (p=0.000 n=10+10)
WriteTo32k-8 828MB/s ± 2% 868MB/s ± 4% +4.80% (p=0.000 n=9+10)
WriteTo128k-8 839MB/s ± 3% 865MB/s ± 2% +3.05% (p=0.003 n=10+10)
WriteTo512k-8 831MB/s ± 2% 869MB/s ± 1% +4.53% (p=0.000 n=10+8)
WriteTo1MiB-8 832MB/s ± 3% 865MB/s ± 3% +4.00% (p=0.000 n=10+10)
WriteTo4MiB-8 830MB/s ± 4% 867MB/s ± 3% +4.42% (p=0.000 n=10+9)
WriteTo4MiBDelay10Msec-8 96.5MB/s ± 1% 96.5MB/s ± 1% ~ (p=0.984 n=10+9)
WriteTo4MiBDelay50Msec-8 20.6MB/s ± 0% 20.6MB/s ± 0% ~ (p=0.118 n=9+10)
WriteTo4MiBDelay150Msec-8 6.94MB/s ± 0% 6.94MB/s ± 0% ~ (all equal)
name old alloc/op new alloc/op delta
WriteTo1k-8 15.8MB ± 0% 15.7MB ± 0% -1.16% (p=0.000 n=10+10)
WriteTo16k-8 15.9MB ± 0% 15.7MB ± 0% -1.16% (p=0.000 n=10+6)
WriteTo32k-8 15.8MB ± 0% 15.7MB ± 0% -1.18% (p=0.000 n=10+10)
WriteTo128k-8 15.8MB ± 0% 15.7MB ± 0% -1.13% (p=0.000 n=10+10)
WriteTo512k-8 15.8MB ± 0% 15.7MB ± 0% -1.16% (p=0.000 n=8+10)
WriteTo1MiB-8 15.8MB ± 0% 15.7MB ± 0% -1.17% (p=0.000 n=10+8)
WriteTo4MiB-8 15.9MB ± 0% 15.7MB ± 0% -1.17% (p=0.000 n=9+10)
WriteTo4MiBDelay10Msec-8 18.8MB ± 0% 18.7MB ± 0% -0.97% (p=0.000 n=10+10)
WriteTo4MiBDelay50Msec-8 32.1MB ± 0% 31.9MB ± 0% -0.56% (p=0.000 n=10+10)
WriteTo4MiBDelay150Msec-8 48.8MB ± 0% 48.5MB ± 0% -0.50% (p=0.000 n=7+10)
name old allocs/op new allocs/op delta
WriteTo1k-8 2.71k ± 0% 2.29k ± 0% -15.49% (p=0.000 n=10+10)
WriteTo16k-8 2.71k ± 0% 2.29k ± 0% -15.48% (p=0.000 n=10+10)
WriteTo32k-8 2.71k ± 0% 2.29k ± 0% -15.44% (p=0.000 n=10+10)
WriteTo128k-8 2.71k ± 0% 2.29k ± 0% -15.52% (p=0.000 n=10+10)
WriteTo512k-8 2.71k ± 0% 2.29k ± 0% -15.49% (p=0.000 n=10+9)
WriteTo1MiB-8 2.71k ± 0% 2.29k ± 0% -15.50% (p=0.000 n=10+10)
WriteTo4MiB-8 2.71k ± 0% 2.29k ± 0% -15.45% (p=0.000 n=9+10)
WriteTo4MiBDelay10Msec-8 3.11k ± 0% 2.68k ± 0% -13.67% (p=0.000 n=9+10)
WriteTo4MiBDelay50Msec-8 3.13k ± 1% 2.71k ± 1% -13.56% (p=0.000 n=9+10)
WriteTo4MiBDelay150Msec-8 3.14k ± 0% 2.71k ± 1% -13.64% (p=0.000 n=8+10)
That's compared to master.
Hi,
I am very busy for the next few weeks and don't have time to test, however this buffer pool could also replace the server side allocator
Yeah, I took a look for more sync.Pool
but I didn’t find any right off the bat. But we can look into bringing it into the allocator as a separate PR.
Building on https://github.com/pkg/sftp/pull/415 I did a bit of poking. I found:
sync.Pool
of[]byte
buffers was causing more allocations than just allocating all new buffers every time.sync.Pool
at all, especially since this is not long-lived.bufPool
means very little code change necessary to theclient.go
code.readFromConcurrent
can easily be fixed to use the less-overhead version as well.