liftbridge-io / go-liftbridge

Go client for Liftbridge. https://github.com/liftbridge-io/liftbridge
Apache License 2.0
66 stars 18 forks source link

Loadbalance publications on multiple brokers #99

Closed Jmgr closed 3 years ago

Jmgr commented 3 years ago

We have performed some load tests where we have published 3000 messages per second on 30 streams on a 3-instance Liftbridge cluster using a single connection. We were wondering why the CPU load and memory usage was not distributed across the Liftbridge instances, but concentrated on one of them.

lb-cpu

lb-mem

After profiling, we found out that it must be the GRPC processing that triggers this: lb-grpc-load

Comparing this with one of the other instances: lb-regular-load

Could it be possible to add an option to the Liftbridge client to dispatch publication requests to all brokers, using the stream name and partition id as a key so that the per-partition ordering is kept? (i.e. consistent hashing) The Liftbridge client would then maintain a connection to all brokers instead of only choosing one of them randomly.

tylertreat commented 3 years ago

Yes, this makes sense to me. Maintaining a single connection to each broker would also greatly simplify the current connection pool logic, though I'm not sure what the performance impact would be reusing the same connection across multiple publishes, subscriptions, etc.

Somewhat related, I'd like to eventually get go-liftbridge to use gRPC's Resolver and Picker interfaces for establishing connections rather than custom logic (see https://github.com/liftbridge-io/go-liftbridge/issues/89).

Jmgr commented 3 years ago

Somewhat related, I'd like to eventually get go-liftbridge to use gRPC's Resolver and Picker interfaces for establishing connections rather than custom logic (see #89).

That makes sense. Both are flagged as being experimental though, wouldn't that be an issue?

tylertreat commented 3 years ago

Both are flagged as being experimental though, wouldn't that be an issue?

I think it's only an issue if we expose those details to the user through the API, but I'm imagining this being an implementation detail of the library, just like the current connection logic.