zendesk / racecar

Racecar: a simple framework for Kafka consumers in Ruby
Apache License 2.0
482 stars 93 forks source link

Maximum application poll interval (max.poll.interval.ms) exceeded (max_poll_exceeded) #288

Open apeacock1991 opened 2 years ago

apeacock1991 commented 2 years ago

We're seeing the following error be raised from rdkafka:

(try 1/10): Error for topic subscription #<struct Racecar::Consumer::Subscription topic="...", start_from_beginning=false, max_bytes_per_partition=1048576, additional_config={}>: Local: Maximum application poll interval (max.poll.interval.ms) exceeded (max_poll_exceeded)

We see this issue for 2/3 of the topics for this consumer, one of the topics is very busy so the poll time wouldn't be exceeded, but for the other 2 topics they are a lot quieter and there can be large windows between messages.

My guess is no messages are being received on those topics, and it's raising this error - it reconnects fine, but is this an issue with Racecar in that because it's not got any messages to pull, rdkafka thinks there is a problem (when there isn't one)?

I saw a similar issue on the Confluent Go client, so wondered if it was a Racecar issue / if there was a fix?

apeacock1991 commented 2 years ago

I actually dug into this some more, and the maybe_select_next_consumer method never gets past the guard clause if the topic receives 1 message per second, due to the maximum wait time being 1 second.

As it will sit in the while loop for up to a second, in the example where it reached the end of the topic, if it then receives 1 message in that window, it basically never ever calls select_next_consumer.

I don't know the exact fix, in our case we can probably lower the max_wait_time, but would be nice if the "scheduler" was better at flipping between the topics?

We can also split the consumers up into two, one with each topic, but that does have a cost implication.

mensfeld commented 2 years ago

@apeacock1991 you can decrease the number of messages you fetch to have smaller batches to process, otherwise you will encounter polls exceeded once in a while.