Open nicholasbishop opened 4 years ago
Thanks for running these benches! I've been meaning to write some but it doesn't make to much sense right now because we are waiting on a new release of prost
that should allow us to avoid an extra copy which I believe the put_slice
is showing that. https://github.com/hyperium/tonic/blob/master/tonic/src/codec/prost.rs#L54
Pretty much tonic wise there is a decent amount of work left to optimize things and the same goes for h2. Mostly, this is just the first attempt at a pure rust http/2 gRPC implementation. So I hope that once the stack starts to stabilize a bit we can put some good work into optimizing things.
Thanks for the info. I'll be sure to give the test another run once prost is updated.
Great! I'd also be very happy to work on this with you, expanding benchmarks and improving stuff is very much welcome here and I do plan on doing a bunch just kinda waiting for the last few things to flush out.
Hi, I also did some benchmark test, to compare the performance of tonic and grpc-go, the service code is very easy, just return an empty struct
async fn read_file(&self, req: Request<PathOpts>) -> Result<Response<ReadFileResult>, Status> {
Ok(Response::new(ReadFileResult{
content: vec![],
error_message: "".to_string()
}))
}
go codes:
func (rs *RemoteServer) ReadFile(_ context.Context, opts *api.PathOpts) (*api.ReadFileResult, error) {
return &api.ReadFileResult{
Content: []byte(""),
ErrorMessage: "",
}, nil
}
the client send requests in 500 goroutines
var wg sync.WaitGroup
concurrentCount := 500
for i := 0; i < concurrentCount; i++ {
wg.Add(1)
go func() {
for i := 0; i < 500; i++ {
if _, err := exec.Command("ls", "-al").CombinedOutput(); err != nil {
fmt.Printf("failed to start remote command %v\n", err)
atomic.AddInt32(&failedCount, 1)
} else {
atomic.AddInt32(&succeedCount, 1)
}
}
wg.Done()
}()
}
wg.Wait()
When sending the tonic server, it costs about 8-9 seconds to finish, while sending to go server, only 4-5 seconds, I thought it is because of the setting of tokio runtime, so I adjust the parameter in the attribute #[tokio::main(core_threads = 16, max_threads = 32)]
, but the change of core_threads from 8 to 16 do not change the result.
By the way, I also compared the performance in this single connection scenario and the tonic has better performance, the C++ implements of grpc which grpc-rs utilize has a bug which makes all requests on one connection handled in a single thread, sequentially.
But I can not explain the performance degradation of tonic compared with grpc-go, maybe some codes in tonic has to be modified. but I have to do more investigation to find out where is the problem.
@abel-von can you please elaborate on the grpc-rs
bug that makes all requests be handled in a single thread?
Is there a new benchmark comparison with grpc-rs? @LucioFranco
https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-01-11-bench-results @LucioFranco @sticnarf
It's better to add continuous performance benchmarking like grpc as following. https://www.grpc.io/docs/guides/benchmarking/ https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5180705743044608
I watched the test results, If there is some unfair environment in it? The latency of grpc C++ is worse than grpc Java is hard to believe for me.
I believe the benchmark test has some unexpected changes, because the chart is weird.
Background: I've been working on an implementation of the remote execution API using tonic for both clients and the server. I've had some trouble getting good performance out of it, so I started looking into grpc-rs to compare speed.
Unfortunately converting my whole project to grpc-rs for a full comparison would be quite a bit of work, and anyway the tonic API+docs seem much friendlier (not to mention the convenience of using modern futures with async/await). So instead I put together a small benchmark of a very simple grpc service:
https://github.com/nicholasbishop/grpc-bench-rs
The benchmark does show grpc-rs performing quite a bit better, as well as supporting a larger number of client connections. (Following the discussion in https://github.com/hyperium/tonic/issues/209 I added a simple retry loop so that failed requests from the tonic client would be retried, but the loop doesn't seem to terminate in a reasonable time once the number of attempted connections gets too high.)
I'd be interested in any feedback on the benchmark itself -- have I made a mistake somewhere that could be hurting tonic's performance numbers?
I used the Linux
perf
tool to do some basic profiling. The items above 5% are: