Open mainTAP opened 4 years ago
This seems to be happening because the "-assignor-leader" topic is created with only one partition. How can one set the amount of partitions for the internal "-assignor-leader" topic ?
The leader topic must have a single partition: whoever ends up with the partition is the leader. If it has two partitions there would be two leaders.
Not sure why that code is there in aiokafka. The fetcher idle time is not going to be updated unless the fetcher is running, so this predicate will just hang forever.
Removing it and continuing the rebalance works just fine, so going with that.
Checklist
master
branch of Faust.Steps to reproduce
-run multiple workers printing from a topic with 1 partition
-trigger re-balance by shutting down one of the workers ( the standby worker takes over )
-after broker_max_poll_interval expires, the standby workers won't be able to take over if the active worker fails
-trigger re-balance again, the standby workers running longer than the broker_max_poll_interval get stuck and will never join the group
Expected behavior
-one of the workers to keep printing the messages from the kafka topic -the other two workers to be stand-by and take over if the active worker fails
Actual behavior
-it works as expected until the broker_max_poll_interval expires and then the stand-by workers get stuck after the re-balance get initiated :
-the worker needs to be restarted to be able to work correctly again
Full traceback
After around 1000 seconds, it hangs
Versions