rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
813 stars 96 forks source link

RabbitMQ quorum queues report that a snapshot sender process exits with a "noproc" #471

Closed rfalias closed 2 weeks ago

rfalias commented 2 weeks ago

Describe the bug

The dbug log below is generated on a select set of queues. The case of noproc is handled as 'ok' but it seems like it should log a warning or error, as this situation caused consumers to be unable to consume or ack messages when the queue was in this state. Upon deleting the queue and having the app re-create it, the error went away and consumers were able to process messages again. There were no network or server issues, as hundreds of other queues snapshot just fine.

2024-09-17 20:28:06.231778+00:00 [dbug] <0.4337.0> queue 'my.queue.name' in vhost 'vh-myvhost': Snapshot sender process <0.16458.84> exited with {noproc,{gen_statem,call,[{'vh-myvhost_event.my.queue.name','rabbit@myserver2'},{install_snapshot_rpc,115,...},{dirty_timeout,...}]}}

Is there another way to find/diagnose this issue or the cause of the noproc for quorum queues that are corrupt or unable to send snapshots?

https://github.com/rabbitmq/ra/blob/f6ab5b9c7055bc8b612a13eab564c68e41f1b2a7/src/ra_server.erl#L2127

Reproduction steps

Unsure, have not been able to cause queues to generate this message on demand

Expected behavior

noproc errors should be logged at the very least higher than dbug

Additional context No response

michaelklishin commented 2 weeks ago

@rfalias this is not a RabbitMQ support forum. This belongs to the Questions category in RabbitMQ Discussions which asks for the details our team needs (requires) in order to provide an informed answer.

You haven't even specified the versions used and shared any logs. We do not guess in this community.