romshark / webwire-go

A transport independent asynchronous duplex messaging library for Go
MIT License
216 stars 9 forks source link

Performance Benchmarking #17

Open KernelPryanic opened 6 years ago

KernelPryanic commented 6 years ago

Are there already any performance benchmarking results available?

romshark commented 6 years ago

I just published a small webwire benchmarking tool.

  1. Start the test server: go run test-server.go
  2. Run the benchmark: go run benchmark.go

Following parameters are available:

Here's an example of a 60 seconds long benchmark with 1,000 concurrent connections each sending requests with a 1 KiB payload in a 10 to 30 milliseconds interval:

go run benchmark.go -clients 1000 -min-req-itv 10 -max-req-itv 30 -min-pld-sz 1024 -max-pld-sz 1024 -req-timeo 60000 -bench-dur 60

And here's the results of the above benchmark:

2018/04/02 21:20:19   Benchmark finished (60s)

  Requests performed:  1892900
  Requests timed out:  0

  Data sent:           1.81 GiB (1938329600 bytes)
  Data received:       1.81 GiB (1938329600 bytes)
  Avg payload size:    1.00 KiB

  Avg req itv:         19.955008ms
  Max req itv:         29ms
  Min req itv:         10ms

  Avg req time:        9.420078ms
  Max req time:        832.1403ms
  Min req time:        1.0004ms

  Req/s:               31548
  Bytes/s:             32305493
  Throughput:          30.81 MiB/s

System: I7 3930K hexa-core @ 3.8 Ghz; 64,0 GB DDR3 RAM @ 1833 Mhz

As you can see I was currently able to achieve around 31,5k requests per second with an average reply time of 9 milliseconds at 1k concurrent clients

romshark commented 6 years ago

Beware

The benchmark is running amok on Windows 10 in case of many concurrent connections.

Windows 10

It seems like TCP/IP connection establishment is very slow on Windows causing huge problems when creating many concurrent connections (> 1000). Too many connections are invoking ridiculously many syscalls on Windows resulting in the Go runtime spawning thousands of OS threads because of syscall-blocked goroutines rendering the machine unresponsive when reaching 10k threads.

trace_benchmark_windows10

In the above screenshot, trace demonstrates the ridiculous amount of syscalls, the slowly degrading performance and the ever growing number of spawned OS threads.

MacOS High Sierra

I've also tested the same configuration on MacOS High Sierra getting very different results:

trace_benchmark_macos_highsierra

The Mac performed just fine with only 27 OS threads. No degrading performance, no syscall spam.

Conclusion

It look more like a Windows related problem rather than a WebWire server/client problem.

romshark commented 6 years ago

I performed a load test using the latest revision and got the following results:

Results

Concurrent Connections 10.000
Request Payload 1 - 64 KiB
Requests Performed 5.919.046
Timeout Rate 0.00%
Sent 183.44 GiB
Received 183.44 GiB
Throughput 313.07 MiB/s
Requests per Second 9.865 rps
Average Latency 1 millisecond
Maximum Latency 4,23 seconds

Test System

Intel i7 3930K (12 threads @ 3.8Ghz, reached full load at 72°C) 64 GB DDR3 1833 Mhz (around 4,75 GB were used during the benchmark)

Consider that both the benchmark and server ran on this machine distorting the results, which could potentially be higher if those were run on different servers.