kafka4beam / brod

Apache Kafka client library for Erlang/Elixir
Apache License 2.0
650 stars 196 forks source link

Low-latency consumer polling #577

Open zidik opened 1 month ago

zidik commented 1 month ago

Scenario: I have a low-volume topic (0-0.5 msg/sec) for which I'd like to have a low propagation delay (up to 30ms) from producer to consumer.

Issue: With brod default settings, the propagation latency hovers closer to 1000ms, because of sleep_timeout:1000ms - every time brod receives an empty response from broker, it will sleep for 1000ms. In our low-volume topic, this happens almost every time.

Attempted solution I turned off sleep_timeout (sleep_timeout: 0) and to prevent exessive polls, I configured the consumer to work in "long-poll-like" manner:

This works perfectly with 1 partition. The latency is super low, and request rate is also low - new request is only sent when message is received or 10 seconds pass.

Problem: but as soon as I add another partition to a topic, it fails - the latency skyrockets to 10 000 - 20 000ms, as brod polls each partition one by one, 10 seconds each. 😞 This is because brod makes a separate request for each partition, and it makes these requests within a single connection. Kafka broker handles only a single in-flight request per connection, therefore one running "long-poll" prevents others from starting.

I would have expected brod to make a single poll, but for all partitions.

Question: Is there any way to achieve the low-latency "long-poll" I described?

Considered alternative solutions:

zmstone commented 1 month ago

Thank you for the report. Batching fetch request cross-partition makes error handling more complicated. I have a plan to implement per-partition connection, WDYT.

zidik commented 1 month ago

Thanks for the quick reply! Yes, a separate connection per partition would help here, as each partition could be long-polled independently.