jonhoo / volley

Volley is a benchmarking tool for measuring the performance of server networking stacks.
MIT License
123 stars 12 forks source link

Benchmark go server using go tip #3

Open jonhoo opened 9 years ago

jonhoo commented 9 years ago

The Go server is seeing relatively poor performance scaling compared to the Rust and C servers. Before reporting this as an upstream bug, we should investing how the Go server performs when using the tip version of go.

jonhoo commented 9 years ago

go-tip

Alas, it seems as though the problem still arises on Go tip.

jonhoo commented 9 years ago

Posted to golang-nuts.

jonhoo commented 9 years ago

@jbardin points out in this reply on golang-nuts that the performance drop is probably caused by the overhead introduces by doing (e)polling instead of blocking socket reads. Continuing the discussion in #4 and #5.

diegobernardes commented 9 years ago

jon, you could run the test again with go tip. it has better goroutines performance (http://talks.golang.org/2015/state-of-go-may.slide#8) and now that we know the problem was in the async-io we can expect even better performance for golang.

i have a doubt that golang can archive or not rust/c performance.

jonhoo commented 9 years ago

Has go tip changed significantly in the past two days?

jonhoo commented 9 years ago

Ah, you mean test go-blocking with go tip? Sure, I'll do that now.

diegobernardes commented 9 years ago

yep :]

jonhoo commented 9 years ago

Done. See https://raw.githubusercontent.com/jonhoo/volley/1d9555441a2d5fa44a712a777fd95dae1503247a/benchmark/perf.png

Performance for go-blocking improves drastically for Go tip, almost to the point where it's as fast as the C and Rust implementations! Cool.

peterhellberg commented 9 years ago

@jonhoo This is great, thank you for doing these benchmarks. Nice to see that Go tip is catching up. :+1:

xekoukou commented 9 years ago

It would be nice to see latency variance as well.

jonhoo commented 9 years ago

@xekoukou pushed to https://github.com/jonhoo/volley/blob/master/benchmark/plot.dat

diegobernardes commented 9 years ago

@jonhoo i got a bit surprised by the latency of golang in the plot.dat file, it is very fast now, but the latency, omg..

but i think i know what is the problem, one thing went unnoticed, rust and c are creating one thread per connection, golang is creating one thread per cpu core.

looking into the plot.dat file the only entry of go-blocking-tip that has low latency is the one that it has the same number of connections and cpu cores(threads):

go-blocking-tip 40 40 39us 5.89us 1000000
rust            40 40 41us 6.68us 1000000
c-threaded      40 40 40us 7.91us 1000000

i don't know if there is anyway to configure golang to create one real thread per goroutine.

jonhoo commented 9 years ago

Well, I could increase GOMAXPROCS, but that comes with its own set of problems unfortunately. It also shouldn't really matter; to quote the Go runtime docs:

There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.

An interesting benchmark to see would be a C implementation using a pool of workers instead of spawning a new thread for each request. That should give us more of an apples-to-apples comparison.

diegobernardes commented 9 years ago

An interesting benchmark to see would be a C implementation using a pool of workers instead of spawning a new thread for each request. That should give us more of an apples-to-apples comparison.

Yes, its better to do this.

Well, I could increase GOMAXPROCS, but that comes with its own set of problems unfortunately. It also shouldn't really matter; to quote the Go runtime docs:

There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

GOMAXPROCS with the value equal to the number of cpus only make sense when we are doing requests using golang nonblock features, so when anything blocks, the thread got a new goroutine to execute. But in 'go-blocking' we are blocking the thread and the quantity of threads in this case make sense. The go-blocking app should accept a extra argument with the number of connections, only doing this the test gonna be fair.

Well at least this is what i think, dont tested, so cant confirm.

I would make a pr, but don't know why i cant compile the c code to do the tests :[

jonhoo commented 9 years ago

It's not entirely clear how to interpret that statement from the docs. While it is true that we're blocking a user-level goroutine, we are also blocking on a system call, so it might be that Go is smart enough to then allow another goroutine to run. I'm not sure about this though.

Can you open another ticket with the C compilation error you're getting?