llucax / python-grpc-benchmark

A simple benchmark comparing Google's protobuf/grpc.aio with betterproto/grpclib
MIT License
3 stars 1 forks source link

Python gRPC implementations performance comparison

This repository hosts a simple benchmark comparing Google's protobuf/grpc.aio with betterproto/grpclib.

While evaluating switching from Google to betterproto (because of all the annoyances listed in betterproto) I was a bit concerned about performance after bumping into an (old) issue saying grpclib is 2 times slower than grpcio. Since the issue was too old, I decided to run a simple benchmark to see how the two libraries compare, to also get some feel about how fast/slow betterproto is.

Also the performance comparison in the issue was using the sync version of grpcio, as it was before grpc.aio was available.

Results

My results show that betterproto/grpclib is actually, at the time of writing, about 2 times faster than grpcio for a single request-reply roundtrip, and about 1.5 times faster than grpcio for streaming 10 numbers. Some preliminary tests show that for streaming numbers more continuously (100.000 numbers), the throughput of grpcio is about 30% higher than betterproto/grpclib.

Also some basic memory benchmarks show that betterproto/grpclib uses less memory than grpcio (about 40% less memory for a single request-reply and about 70% less momory for streaming 10 numbers, resulting in the same reduction of minor page faults).

For both libraries memory consumption seem to stay stable when processing more messages (request-reply or streaming).

$ ./benchmark 
grpcio
        1 request-reply:       100 loops, best of 5: 2.98 msec per loop
                                40352 KB max | 109;53936 in;voluntary context switches | 0;27767 major;minor page faults
        streaming 10 numbers:  100 loops, best of 5: 2.86 msec per loop
                                45548 KB max | 201;77380 in;voluntary context switches | 0;25620 major;minor page faults
grpclib
        1 request-reply:       200 loops, best of 5: 1.39 msec per loop
                                28560 KB max | 50;16684 in;voluntary context switches | 0;13323 major;minor page faults
        streaming 10 numbers:  100 loops, best of 5: 2.25 msec per loop
                                26840 KB max | 54;8232 in;voluntary context switches | 0;10396 major;minor page faults

Doing some tests with streaming more numbers (100.000) shows that betterproto/grpclib has a better throughput than grpcio (about 9.500 numbers/second vs 12.500 numbers/second, so around 30% more throughput). The memory consumption remains lower in betterproto/grpclib than in grpcio though (about 60% less).

It might be worth crafting a benchmark more oriented to test throughput to get more conclusive results.

Test conditions

Running the benchmark

Requirements

Building

Run ./build to build the Docker image for the server and generate the Python bindings for the clients.

Running

Run ./benchmark to run the benchmark (to get slighly more stable results you might want to use sudo nice -n-10 ./benchmark).

Cleanning up

Run ./clean to remove generated files and the generated Docker image.