jgaskins / grpc

Pure-Crystal implementation of gRPC
MIT License
76 stars 12 forks source link

Failures when running ghz benchmark #4

Closed LesnyRumcajs closed 4 years ago

LesnyRumcajs commented 4 years ago

Hello! I'm trying to use this library for small gRPC benchmarking I have here https://github.com/LesnyRumcajs/grpc_bench

Unfortunately I noticed some issues, I'm not completely sure if they're connected to the library itself or some of it's dependencies, or it doesn't play nice with ghz defaults.

Basically manual calls work quite fine (tested with Bloom RPC), the one over ghz seem to miserably fail. All in all, there is no OK response over 30s of calls.

Summary:
  Count:    210020
  Total:    30.00 s
  Slowest:  0 ns
  Fastest:  0 ns
  Average:  6.50 ms
  Requests/sec: 7000.24

Response time histogram:

Latency distribution:

Status code distribution:
  [Unavailable]   179204 responses   
  [Unknown]       30816 responses    

Error distribution:
  [179154]   rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"   
  [30816]    rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field                                                                                                                      
  [50]       rpc error: code = Unavailable desc = transport is closing                                                                                                                                                             

The same test for Ruby seems rather fine:


Summary:
  Count:    74373
  Total:    30.01 s
  Slowest:  802.53 ms
  Fastest:  0.81 ms
  Average:  20.12 ms
  Requests/sec: 2478.62

Response time histogram:
  0.810 [1] |
  80.982 [73297]    |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  161.155 [32]  |
  241.327 [50]  |
  321.499 [2]   |
  401.671 [48]  |
  481.843 [0]   |
  562.015 [1]   |
  642.187 [1]   |
  722.360 [0]   |
  802.532 [97]  |

Latency distribution:
  10 % in 12.92 ms 
  25 % in 13.60 ms 
  50 % in 14.62 ms 
  75 % in 17.28 ms 
  90 % in 38.02 ms 
  95 % in 40.50 ms 
  99 % in 44.42 ms 

Status code distribution:
  [OK]                  73529 responses   
  [ResourceExhausted]   795 responses     
  [Unavailable]         49 responses      

Error distribution:
  [795]   rpc error: code = ResourceExhausted desc = No free threads in thread pool   
  [49]    rpc error: code = Unavailable desc = transport is closing                   

Same goes for other languages / libraries I have tested. Could you please take a look into this? The code for the Crystal gRPC benchmark itself is on the branch for this PR. https://github.com/LesnyRumcajs/grpc_bench/pull/14

jgaskins commented 4 years ago

I hadn't heard of ghz before. Thanks for the tip on a benchmarking tool for this! :100:

When I ran ghz against your Crystal example, I didn't get any of the connection errors you did, but the errors due to the missing content-type header and the errors from ungraceful handling of the socket closing definitely did happen on my machine. I've released v0.1.3 that fixes the content-type error, but I'll hold off looking into the error for the dirty exit because it looks like the Ruby gem (maintained by Google) has the same issue.

Here's the output on my machine with this update:

➜  crystal_grpc git:(crystal_grpc) ✗ ghz --proto=../proto/helloworld/helloworld.proto --call=helloworld.Greeter.SayHello --insecure --duration 30s -d "{\"name\":\"it's not as performant as we expected\"}" 127.0.0.1:50051

Summary:
  Count:    616479
  Total:    30.00 s
  Slowest:  16.58 ms
  Fastest:  0.20 ms
  Average:  2.41 ms
  Requests/sec: 20548.94

Response time histogram:
  0.199 [1] |
  1.837 [55166] |∎∎∎∎∎
  3.474 [473195]    |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.112 [86916] |∎∎∎∎∎∎∎
  6.750 [407]   |
  8.387 [471]   |
  10.025 [222]  |
  11.663 [38]   |
  13.300 [4]    |
  14.938 [2]    |
  16.576 [7]    |

Latency distribution:
  10 % in 1.86 ms
  25 % in 1.98 ms
  50 % in 2.12 ms
  75 % in 2.51 ms
  90 % in 3.74 ms
  95 % in 3.92 ms
  99 % in 4.30 ms

Status code distribution:
  [OK]            616429 responses
  [Canceled]      1 responses
  [Unavailable]   49 responses

Error distribution:
  [1]    rpc error: code = Canceled desc = grpc: the client connection is closing
  [49]   rpc error: code = Unavailable desc = transport is closing

I have a feeling your connection errors are due to the container being up but the program is still compiling, so I'll also put a couple suggestions on your benchmark PR. 🙂

LesnyRumcajs commented 4 years ago

As stated in the PR, your suggestions greatly helped. On top of that I don't get the content-type errors anymore. Thank you! Don't bother with this dirty exit, it happens in all of the benchmarks I have. Something I'll probably need to address. :)