Closed funny-falcon closed 9 years ago
Thanks for the participation, but unfortunately I cannot merge these changes due to problems described above.
It's a pity that you even measured proposed changes and their impact. I share not just my "thoughts" but experience, and you talk about it as if it is "premature optimizations". Yeah, it is "premature optimizations" if you never plan to reach 100000rps and more. But it is profile guided optimization at that rate, cause i did profile at that rate.
I just benchmarked blocking select with non-blocking select and was impressed by the numbers:
func makeChans() (bCh chan struct{}, nbCh chan struct{}) {
nbCh = make(chan struct{})
close(nbCh)
bCh = make(chan struct{})
return
}
func BenchmarkBlockingSelect(b *testing.B) {
bCh, nbCh := makeChans()
b.ResetTimer()
for i := 0; i < b.N; i++ {
select {
case <-nbCh: // query queue emulation
case <-bCh: // timer emulation
}
}
}
func BenchmarkNonBlockingSelect(b *testing.B) {
bCh, nbCh := makeChans()
b.ResetTimer()
for i := 0; i < b.N; i++ {
select {
case <-nbCh:
default:
b.Fatalf("unexpected code path")
select {
case <-nbCh:
case <-bCh:
}
}
}
}
Results:
BenchmarkBlockingSelect 5000000 256 ns/op
BenchmarkNonBlockingSelect 30000000 43.2 ns/op
This proves that your change with select increases its' performance by more than 6 times! The change has been adopted in the code - see https://github.com/valyala/gorpc/commit/ec896817a62b8d935de640ecceae7db2560c9b75 .
FYI, I slightly updated benchmark code to be closer to the reality:
func makeChans(n int) (bCh chan struct{}, nbCh chan struct{}) {
nbCh = make(chan struct{}, n)
bCh = make(chan struct{})
return
}
func BenchmarkBlockingSelect(b *testing.B) {
bCh, nbCh := makeChans(b.N)
b.ResetTimer()
for i := 0; i < b.N; i++ {
select {
case nbCh <- struct{}{}: // query queue emulation
case <-bCh: // timer emulation
}
}
}
func BenchmarkNonBlockingSelect(b *testing.B) {
bCh, nbCh := makeChans(b.N)
b.ResetTimer()
for i := 0; i < b.N; i++ {
select {
case nbCh <- struct{}{}:
default:
b.Fatalf("Unexpected code path")
select {
case nbCh <- struct{}{}:
case <-bCh:
}
}
}
}
But this didn't change benchmark results.
As to the change with time.After() -> time.NewTimer(), benchark results prove this was worthless optimization:
func BenchmarkTimerStop(b *testing.B) {
for i := 0; i < b.N; i++ {
t := time.NewTimer(time.Millisecond)
select {
case <-t.C:
default:
t.Stop()
}
}
}
func BenchmarkTimerNoStop(b *testing.B) {
for i := 0; i < b.N; i++ {
tc := time.After(time.Millisecond)
select {
case <-tc:
default:
}
}
}
Results:
BenchmarkTimerStop-4 1000000 1483 ns/op 192 B/op 3 allocs/op
BenchmarkTimerNoStop-4 2000000 854 ns/op 192 B/op 3 allocs/op
Yeah, you are right about timers: I benched it when go were 1.1, and then each timer expiration created goroutines. When I asked for a way to call function within timers goroutine, they refuse it. And now they call timers function in a timers goroutine.... just as i asked, and they refused...
FYI, I added an ability to disable message buffering on both client and server for those who need minimal rpc latency. But benchark results say that gorpc has better throughput with enabled message buffering:
10K workers, default 5ms delay for message buffering:
BenchmarkEchoInt10000Workers-4 100000 10849 ns/op
BenchmarkEchoIntNocompress10000Workers-4 200000 10088 ns/op
BenchmarkEchoString10000Workers-4 100000 11528 ns/op
BenchmarkEchoStringNocompress10000Workers-4 100000 11218 ns/op
BenchmarkEchoStruct10000Workers-4 100000 14669 ns/op
BenchmarkEchoStructNocompress10000Workers-4 100000 13533 ns/op
10K workers, disabled message buffering:
BenchmarkEchoInt10000Workers-4 50000 27190 ns/op
BenchmarkEchoIntNocompress10000Workers-4 100000 14499 ns/op
BenchmarkEchoString10000Workers-4 50000 29251 ns/op
BenchmarkEchoStringNocompress10000Workers-4 100000 16329 ns/op
BenchmarkEchoStruct10000Workers-4 50000 32244 ns/op
BenchmarkEchoStructNocompress10000Workers-4 100000 19387 ns/op
your code is not equivalent to my proposal. I'll try to fix it and post bench soon. My proposal didn't disable buffering, it just makes buffering low latency.
Made a separate pool request #2 with benching