Open zidik opened 1 month ago
Thank you for the report. Batching fetch request cross-partition makes error handling more complicated. I have a plan to implement per-partition connection, WDYT.
Thanks for the quick reply! Yes, a separate connection per partition would help here, as each partition could be long-polled independently.
Scenario: I have a low-volume topic (0-0.5 msg/sec) for which I'd like to have a low propagation delay (up to 30ms) from producer to consumer.
Issue: With
brod
default settings, the propagation latency hovers closer to 1000ms, because ofsleep_timeout:1000ms
- every time brod receives an empty response from broker, it will sleep for 1000ms. In our low-volume topic, this happens almost every time.Attempted solution I turned off sleep_timeout (
sleep_timeout: 0
) and to prevent exessive polls, I configured the consumer to work in "long-poll-like" manner:min_bytes=1
- ensures that the broker would wait and return only if there is at least something in the topicmax_wait_time=10 000ms
- if no messages arrive within 10 seconds, just return, and start a new requestThis works perfectly with 1 partition. The latency is super low, and request rate is also low - new request is only sent when message is received or 10 seconds pass.
Problem: but as soon as I add another partition to a topic, it fails - the latency skyrockets to 10 000 - 20 000ms, as
brod
polls each partition one by one, 10 seconds each. 😞 This is becausebrod
makes a separate request for each partition, and it makes these requests within a single connection. Kafka broker handles only a single in-flight request per connection, therefore one running "long-poll" prevents others from starting.I would have expected
brod
to make a single poll, but for all partitions.Question: Is there any way to achieve the low-latency "long-poll" I described?
Considered alternative solutions: