rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
11.84k stars 3.9k forks source link

Quorum queue becomes unavailable despite all nodes are UP #11561

Closed cvuillemez closed 1 week ago

cvuillemez commented 1 week ago

Describe the bug

A quorum queue become unavailable (after it was purged manually, not able to say if it's the root cause but it's a fact). The only possible remediation was to delete / re-create the queue.

RabbitMQ version is 3.12.13-1 Erlang is 25.3.2.11

State of the queue during issue:

sudo rabbitmq-queues --vhost vhostXX quorum_status queueXX
Status of quorum queue queueXX on node rabbit@mq02 ...
┌─────────────────────────────────────────┬────────────┬───────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name                               │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@mq01                             │ noproc     │           │              │                │      │                 │
├─────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@mq02                             │ noproc     │           │              │                │      │                 │
├─────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@mq03                             │ noproc     │           │              │                │      │                 │
└─────────────────────────────────────────┴────────────┴───────────┴──────────────┴────────────────┴──────┴─────────────────┘

See below the related logs (which is similar on 2 nodes):

Warning :

 Error purging queue: {'EXIT',{{case_clause,{error,noproc}},
       [{rabbit_channel,'-handle_method/6-fun-2-',1,
                         [{file,"rabbit_channel.erl"},{line,2624}]},
         {rabbit_misc,with_exit_handler,2,
                     [{file,"rabbit_misc.erl"},{line,511}]},
          {rabbit_channel,handle_method,6,[]}]}}
 rabbit_sysmon_handler busy_dist_port <0.22973.0> [{name,delegate_management_3},{initial_call,{delegate,init,1}},{gen_server2,process_next_msg,1},{message_queue_len,0}] {#Port<0.21>,unknown}

Error :

 ** Generic server <0.26851.3021> terminating
 ** Last message in was {'$gen_cast',
                            {method,
                                {'basic.consume',0,
                                    <<"queueXX">>,
                                    <<>>,false,false,false,false,[]},
                                none,noflow}}
 ** When Server state == {ch,
                          {conf,running,rabbit_framing_amqp_0_9_1,219,
                           <0.6888.5960>,<0.17018.3016>,<0.6888.5960>,
                           <<"127.0.0.2:56276 -> 127.0.0.1:5671">>,
                           undefined,
                           {user,<<"admin">>,
                            [administrator],
                            [{rabbit_auth_backend_internal,
                              #Fun<rabbit_auth_backend_internal.3.20918515>}]},
                           <<"vhostXX">>,<<>>,<0.11978.5963>,
                           [{<<"exchange_exchange_bindings">>,bool,true},
                            {<<"connection.blocked">>,bool,true},
                            {<<"authentication_failure_close">>,bool,true},
                            {<<"basic.nack">>,bool,true},
                            {<<"publisher_confirms">>,bool,true},
                            {<<"consumer_cancel_notify">>,bool,true}],
                           none,50,134217728,1800000,#{},1000000000,false},
                          {lstate,<0.31172.3009>,false},
                          none,1,
                          {0,[],[]},
                          {state,#{},erlang},
                          #{},#{},
                          {state,fine,5000,
                           #Ref<0.1413075262.3908042754.125576>},
                          false,1,
                          {rabbit_confirms,undefined,#{}},
                          [],[],none,flow,[],
                          {rabbit_queue_type,#{}},
                          #Ref<0.1413075262.3908042754.125570>,false}
 ** Reason for termination ==                                                                                                                                                                                                          [10/2006]
 ** {{badmatch,{error,noproc}},
     [{rabbit_quorum_queue,consume,3,
                           [{file,"rabbit_quorum_queue.erl"},{line,792}]},
      {rabbit_queue_type,consume,3,[{file,"rabbit_queue_type.erl"},{line,392}]},
      {rabbit_channel,'-basic_consume/8-fun-0-',10,
                      [{file,"rabbit_channel.erl"},{line,1747}]},
      {rabbit_misc,with_exit_handler,2,[{file,"rabbit_misc.erl"},{line,511}]},
      {rabbit_channel,basic_consume,8,
                      [{file,"rabbit_channel.erl"},{line,1744}]},
      {rabbit_channel,handle_method,3,
                      [{file,"rabbit_channel.erl"},{line,1423}]},
      {rabbit_channel,handle_cast,2,[{file,"rabbit_channel.erl"},{line,633}]},
      {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]}]}

   crasher:
     initial call: rabbit_channel:init/1
     pid: <0.26851.3021>
     registered_name: []
     exception exit: {{badmatch,{error,noproc}},
                      [{rabbit_quorum_queue,consume,3,
                           [{file,"rabbit_quorum_queue.erl"},{line,792}]},
                       {rabbit_queue_type,consume,3,
                           [{file,"rabbit_queue_type.erl"},{line,392}]},
                       {rabbit_channel,'-basic_consume/8-fun-0-',10,
                           [{file,"rabbit_channel.erl"},{line,1747}]},
                       {rabbit_misc,with_exit_handler,2,
                           [{file,"rabbit_misc.erl"},{line,511}]},
                       {rabbit_channel,basic_consume,8,
                           [{file,"rabbit_channel.erl"},{line,1744}]},
                       {rabbit_channel,handle_method,3,
                           [{file,"rabbit_channel.erl"},{line,1423}]},
                       {rabbit_channel,handle_cast,2,
                           [{file,"rabbit_channel.erl"},{line,633}]},
                       {gen_server2,handle_msg,2,
                           [{file,"gen_server2.erl"},{line,1056}]}]}
       in function  gen_server2:terminate/3 (gen_server2.erl, line 1172)
     ancestors: [<0.6138.3007>,<0.30273.5962>,<0.4133.5959>,<0.18031.5957>,
                   <0.16014.0>,<0.16013.0>,<0.16012.0>,<0.16010.0>,<0.16009.0>,
                   rabbit_sup,<0.831.0>]
     message_queue_len: 1
     messages: [{'$gen_cast',
                       {command,
                           {'basic.consume_ok',
                               <<"amq.ctag-oVuWAOxAiJMzPnaojKS07w">>}}}]
     links: [<0.6138.3007>]
     dictionary: [{process_name,
                       {rabbit_channel,
                           {<<"127.0.0.2:56276 -> 127.0.0.1:5671">>,
                            219}}},
                   {permission_cache,
                       [{{resource,<<"vhostXX">>,queue,
                             <<"queueXX">>},
                         #{},read}]},
                   {permission_cache_can_expire,false},
                   {rand_seed,
                       {#{jump => #Fun<rand.3.34006561>,
                          max => 288230376151711743,
                          next => #Fun<rand.5.34006561>,type => exsplus},
                        [87273550521800029|10282284222925812]}},
                   {guid_secure,
                       {{24,'rabbit@mq01',
                         #Ref<0.1413075262.3908042754.125589>},
                        0}},
                   {channel_operation_timeout,15000}]
     trap_exit: true
     status: running
     heap_size: 17731
     stack_size: 28
     reductions: 41672
   neighbours:

Reproduction steps

no steps

Expected behavior

no crash :)

Additional context

No response