Closed gaby closed 2 months ago
Thanks for making the suggestion and running the benchmarks! It indeed looks better to just switch to sync.Pool
. It makes sense that is faster now as it contains a bunch of runtime optimizations that ByteBufferPool
will never be able to do.
I'm open to a pull request to completely swap ByteBufferPool
with sync.Pool
.
@erikdubbelboer Awesome, I'm currently testing the changes on Fiber. Will report back and see how much work it requires to do the swap in Fasthttp.
In any version of Golang, bytebufferpool is slower than directly using sync.Pool, and the performance difference is almost consistent. Because bytebufferpool is designed to optimize memory usage to the greatest extent. Its goal is to prevent the pool from being filled with large memory slices due to a few high-memory requests, while the majority of memory requests don't require such large allocations. The performance difference between Go 1.18 and Go 1.23 is almost the same because both rely on sync.Pool. However, bytebufferpool performs bookkeeping during the recycling of byte slices and recalculates the threshold when a certain point is reached.
❯ ~/go/go1.18/bin/go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/gaby/go-benchmark/buffer/main
cpu: 13th Gen Intel(R) Core(TM) i9-13900KS
BenchmarkGenericBuf-32 4131448 270.7 ns/op
BenchmarkGenericStackBuf-32 2675854 451.5 ns/op
BenchmarkAllocBuf-32 4409329 295.9 ns/op
BenchmarkSyncPoolBuf-32 305353983 3.810 ns/op
BenchmarkBpoolPoolBuf-32 4470063 266.0 ns/op
BenchmarkByteBufferPoolBuf-32 53832303 21.90 ns/op
BenchmarkEasyJsonBuffer-32 7854277 157.2 ns/op
BenchmarkEasyJsonBuffer_OptimizedConfig-32 124866505 10.04 ns/op
PASS
ok github.com/gaby/go-benchmark/buffer/main 13.609s
❯ ~/go/go1.23.0/bin/go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/gaby/go-benchmark/buffer/main
cpu: 13th Gen Intel(R) Core(TM) i9-13900KS
BenchmarkGenericBuf-32 3460868 375.0 ns/op
BenchmarkGenericStackBuf-32 3203481 371.2 ns/op
BenchmarkAllocBuf-32 3103330 363.0 ns/op
BenchmarkSyncPoolBuf-32 436120023 2.616 ns/op
BenchmarkBpoolPoolBuf-32 4311795 278.1 ns/op
BenchmarkByteBufferPoolBuf-32 55834230 20.91 ns/op
BenchmarkEasyJsonBuffer-32 7717189 162.2 ns/op
BenchmarkEasyJsonBuffer_OptimizedConfig-32 88038954 13.53 ns/op
PASS
ok github.com/gaby/go-benchmark/buffer/main 11.460s
After removing the bookkeeping and validation code from bytebufferpool
, its speed is consistent with that of sync.Pool
.
// Put releases byte buffer obtained via Get to the pool.
//
// The buffer mustn't be accessed after returning to the pool.
func (p *Pool) Put(b *ByteBuffer) {
/*
idx := index(len(b.B))
if atomic.AddUint64(&p.calls[idx], 1) > calibrateCallsThreshold {
p.calibrate()
}
maxSize := int(atomic.LoadUint64(&p.maxSize))
if maxSize == 0 || cap(b.B) <= maxSize {
b.Reset()
p.pool.Put(b)
}
*/
b.Reset()
p.pool.Put(b)
}
❯ ~/go/go1.18/bin/go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/gaby/go-benchmark/buffer/main
cpu: 13th Gen Intel(R) Core(TM) i9-13900KS
BenchmarkGenericBuf-32 4051310 266.1 ns/op
BenchmarkGenericStackBuf-32 4599678 252.7 ns/op
BenchmarkAllocBuf-32 4741383 255.2 ns/op
BenchmarkSyncPoolBuf-32 531053491 2.262 ns/op
BenchmarkBpoolPoolBuf-32 4293218 276.6 ns/op
BenchmarkByteBufferPoolBuf-32 314608893 3.550 ns/op
BenchmarkEasyJsonBuffer-32 7363118 166.0 ns/op
BenchmarkEasyJsonBuffer_OptimizedConfig-32 100000000 10.33 ns/op
PASS
ok github.com/gaby/go-benchmark/buffer/main 11.147s
❯ ~/go/go1.23.0/bin/go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/gaby/go-benchmark/buffer/main
cpu: 13th Gen Intel(R) Core(TM) i9-13900KS
BenchmarkGenericBuf-32 3512586 366.1 ns/op
BenchmarkGenericStackBuf-32 3442252 357.3 ns/op
BenchmarkAllocBuf-32 3388725 369.6 ns/op
BenchmarkSyncPoolBuf-32 333597666 3.660 ns/op
BenchmarkBpoolPoolBuf-32 4197034 285.4 ns/op
BenchmarkByteBufferPoolBuf-32 689101736 1.845 ns/op
BenchmarkEasyJsonBuffer-32 7685049 156.6 ns/op
BenchmarkEasyJsonBuffer_OptimizedConfig-32 88073781 13.11 ns/op
PASS
ok github.com/gaby/go-benchmark/buffer/main 11.897s
This benchmark does not reflect real-world usage scenarios. We can consider optimizing the algorithm or providing users with a configuration switch, allowing them to configure the pool implementation themselves, similar to how it's done in the client implementation. We shouldn't hastily replace it with a simple sync.Pool
. After all, in the case of HTTP request and response handling, the size of the response body and request body can vary greatly across different scenarios. A rushed replacement might go against the original intent of the fasthttp
author.
I think this level of optimization isn't ideal for user configuration—it’s something framework maintainers should handle. Perhaps @newacorn could help create a more comprehensive benchmark suite to better reflect real-world use cases and guide the decision?
I think this level of optimization isn't ideal for user configuration—it’s something framework maintainers should handle. Perhaps @newacorn could help create a more comprehensive benchmark suite to better reflect real-world use cases and guide the decision?
Whether or not to provide user configuration is optional. I also agree with your point that such optimizations should be provided by the maintainers. Because these are not important; they are just suggestions and no one has put them into practice. I just feel that using sync.Pool simply does not align with the semantics of the original implementation. Since sync.Pool is consistently faster than bytebufferpool in any version of Go, and the performance difference is consistent, why didn’t the author use sync.Pool when designing fasthttp in the first place? @erikdubbelboer supports this proposal based on the belief that Go has optimized sync.Pool in recent versions. However, this change is not related to Go's runtime, as both implementations are based on sync.Pool. The original implementation was aimed at conserving resources. When we optimize, should we not consider the original intention of the implementation?
Sorry, I don't intend to provide such a benchmark because this issue is fundamentally different from the original implementation. Given that the goals of the implementations are different, what is the point of providing a benchmark?
@newacorn The point is to discuss whether it makes sense to swap ByteBufferPool
with sync.Pool
. There's no pull request or any changes done.
I'm currently testing changing some of Fiber
to use sync.Pool
and I am getting 0 improvements.
@newacorn The point is to discuss whether it makes sense to swap
ByteBufferPool
withsync.Pool
. There's no pull request or any changes done.I'm currently testing changing some of
Fiber
to usesync.Pool
and I am getting 0 improvements.
Sorry, I looked at the wrong column and confused the issue with the pull request.
@gaby Actually, your proposal can indeed improve performance, but it might increase memory usage in certain special scenarios. My intention was to suggest that if you implement this pull request, it would be even better to provide a configuration option to maintain compatibility with the original semantics, as different users might have different use cases. I’m not negating your proposal.
@newacorn I tried swapping in 3 places of GoFiber the ByteBufferPool
to sync.Pool
and I get same performance or worst.
This is probably related to what you mentioned before of buffers having different sizes
The other option I can see is merging ByteBufferPool
into fasthttp
and doing fixes/updates here. Given that the upstream ByteBufferPool
doesn't get any updates. Thoughts ? @erikdubbelboer
@erikdubbelboer I was curious about the performance of
ByteBufferPool
now days, I decided to re-run the benchmarks it mentions on the README and to my surprisesync.Pool
is more performant these days. This also probably explains whyVictoriaMetrics
doesn't useByteBufferPool
anymore as seen here: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/lib/bytesutil/bytebuffer.go#L128I created a workflow with benchmarks using go1.20 through 1.23 for both Ubuntu Latest and MacOS with M1 CPU. You can see the results here: https://github.com/gaby/go-benchmark/actions/runs/10721391990
TLDR:
Ubuntu Latest - Golang v1.20.x
Ubuntu Latest - Golang v1.23.x
MacOS Latest (Apple M1) - Golang v1.23.x