yoshidan / google-cloud-rust

Google Cloud Client Libraries for Rust.
MIT License
222 stars 80 forks source link

pubsub: improve throughput by using multiple subscriber clients #153

Closed dezyh closed 11 months ago

dezyh commented 1 year ago

Internally, PubSub has a 10MB/s limit per subscriber client. They advise creating multiple clients in order to achieve higher throughput if required. It seems that Java and Go client libraries do this automatically.

Currently, the google_cloud_pubsub::client::ClientConfig exposes a pool_size field which is being used to create the desired number of gRPC connections. The issue is these connections do not get used by the google_cloud_pubsub::client::Client in parallel. It seems like pull requests are interleaved between each connection sequentially like: 0, 1, 2, 3, 0, 1, 2, 3, ....

I have a rather simple improvement which spawns a separate google_cloud_pubsub::subscriber::Subscriber::start() for each gRPC connection in the connection pool. This might not be the best approach so I'm looking for any feedback you have on that.

Before this change, using the default pool size of 4 on an instance on GCP, I could only get 80 Mbps. After the change I can get up to 300 Mbps (which was as fast as the source was producing). I might provide some more benchmarks later.

I noticed that this pattern is kind of being used in receive() but this fn is only used in tests.

dezyh commented 1 year ago

To add some additional context, I'm wondering if it would be better to give each Subscriber::start() a dedicated connection, rather than having all of them share the available connections in the connection pool.

For example, currently subscriber A will use connection 0, 1, 2, 3, 0, ... and subscriber B will use connection 1, 2, 3, 0, 1, .. for requests. I wonder if it might be better for subscriber A to use just connection 0 and subscriber B to use just connection 1, etc (assuming there is the same number of connections and subscriber clients.

However, I notice the Go implementation has separation between gRPC connections and subscriber goroutines, stating that "Each connection supports up to 100 streams" so maybe it is best to decouple gRPC connections from subscriber clients.

dezyh commented 11 months ago

Sorry for the delay.

yoshidan commented 11 months ago

Thanks!