Open chengB12 opened 1 year ago
I probably have same problem that within a consumer group with 3 consumers. when I produce message to different partitions, only partition0 would get respond
I also get this issue quite often, but randomly, I cannot stably reproduce it. My topic has 32 partitions, randomly one of the partitions was left behind, but all others are finished and always keep up with the newly produced messages.
There are no errors/exceptions; the eachBatch
callback keeps calling with other partitions, and even though those partitions have only one new message, the offsetLag is always 0.
My deployment was 16 pods (on k8s) consuming 32-partitions topic. The number of pods can vary as we have auto-scaling. Killing the pod holding the partitions sometimes does not help.
Describe the bug I have 2 load-balancing instances which consuming a kafka topic with 2 partitions
when both started about same time, one pod reported getting partition 1 and then without any issue however one pod never generating any kakfa logs, and neither have connection error log, and not fetching anything. However, apparently, it was holding partition 0 since the other instance never got group rebalanced to get both partition.
Other non-kafka operation on affect instances looks fine.
This situation last for hours until I am aware of it and kill and restart affected instance. then new instance got partition 0 on start. and works fine.
My guess is some communication to brokers failed at some point, but heartbeat is still on, which makes brokers treat this consumer still alive
To Reproduce Seems one off, can't reproduce
Expected behavior If there is any troubles on connection to kakfa, it should throw error at least
Observed behavior it connected and hold partition, but do nothing, without generating any success/fail log
Environment:
Additional context Add any other context about the problem here.