Open Tasyp opened 1 week ago
Are you saying that you set partition_assignment_strategy to callback_implemented when you start brod_group_subscriber_v2?
Are you saying that you set partition_assignment_strategy to callback_implemented when you start brod_group_subscriber_v2?
Correct, yes. I've implemented the callback as well.
Hi @Tasyp
If brod_group_subscriber_v2
shuts down, brod_group_coordinator
should receive an EXIT
message and terminate itself.
https://github.com/kafka4beam/brod/blob/5172dbe5565bf1f234b8e5eaa7dc8924c1d3c05a/src/brod_group_coordinator.erl#L378-L384
The noproc
exception when making a call to the MemberPid
seems to be a race condition,
Maybe you can try to see if there is a {'EXIT', Pid, shutdown}
message in the coordinator process mailbox written to the log (which seems to be truncated when reporting this issue)?
Anyways, the fix is for brod_group_coordinator
to check if MemberPid
is alive or catch noproc
excaption when evaluating below callbacks:
MemberModule:assignments_revoked
MemberModule:assign_partitions
MemberModule:assignments_received
If MemberPid
is not alive, it should terminate itself (which will trigger a leave-group request in the gen_server
terminate
callback.
I have implemented a consumer using
brod_group_subscriber_v2
and with a custom partitioning strategy. The setup includes multiple consumers on different nodes. The setup works well until you try to shut down the application node by node.The process crashes with the following error:
I wanted to ask whether this is an expected behavior? It seems as if the coordinator is still up but the
brod_group_subscriber_v2
process has already exited so it cannot respond.I am not quite sure how to fix it because the coordinator seems to be linked to the group subscriber. So I would assume this shouldn't happen at all?
If you have any suggestions, on how to avoid this crash, I could help with implementing it and opening a PR.