rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.35k stars 3.92k forks source link

Dead lettering more than two times results in a crashed queue #216

Closed riyad closed 9 years ago

riyad commented 9 years ago

We have a similar setup to #161 where we publish messages into (unique) timeout queues that drop the messages back into their original queues for retrying.

Our setup started to loose messages after we updated to RabbitMQ 3.5.3. Looking at the logs we found this:

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
** Generic server <0.678.0> terminating
** Last message in was {drop_expired,1}
** When Server state == 
[...]
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,'-group_by_queue_and_reason/1-fun-0-',3,[]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
     {rabbit_dead_letter,group_by_queue_and_reason,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]}]}

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
Restarting crashed queue 'toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c' in vhost '/'.

We were able to reproduce this everytime a message was to drop out of the timeout queue (toq) for the third time (hence the "gen2" part of the queue name in the logs).

Full log (names of queues and exchanges redacted):

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
** Generic server <0.678.0> terminating
** Last message in was {drop_expired,1}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,
                           <<"toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                          false,false,none,
                          [{<<"x-message-ttl">>,signedint,9000},
                           {<<"x-dead-letter-exchange">>,longstr,
                            <<"REDACTED_EXCHANGE.dlx">>},
                           {<<"x-expires">>,signedint,10000},
                           {<<"x-dead-letter-routing-key">>,longstr,
                            <<"REDACTED_QUEUE_NAME">>}],
                          <0.678.0>,[],[],[],undefined,[],[],live},
                         none,false,rabbit_priority_queue,
                         {passthrough,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {1,
                            {[],
                             [{msg_status,0,
                               <<156,89,98,190,170,144,189,166,217,107,135,227,
                                 40,191,240,59>>,
                               {basic_message,
                                {resource,<<"/">>,exchange,<<>>},
                                [<<"toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>],
                                {content,60,
                                 {'P_basic',undefined,undefined,
                                  [{<<"x-message-id">>,longstr,
                                    <<"96aba311-624f-4dc0-802b-70138917816c">>},
                                   {<<"x-puka-delivery-tag">>,signedint,4},
                                   {<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"exchange">>,longstr,<<>>},
                                       {<<"time">>,timestamp,1435052830},
                                       {<<"queue">>,longstr,
                                        <<"toq-gen1-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"toq-gen1-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>}]}]},
                                     {table,
                                      [{<<"count">>,signedint,1},
                                       {<<"exchange">>,longstr,<<>>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"toq-gen0-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>}]},
                                       {<<"queue">>,longstr,
                                        <<"toq-gen0-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"time">>,timestamp,1435052826}]}]},
                                   {<<"x-origin-queue">>,longstr,
                                    <<"REDACTED_QUEUE_NAME">>}],
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined},
                                 <<32,0,0,0,3,242,12,120,45,109,101,115,115,97,
                                   103,101,45,105,100,83,0,0,0,36,57,54,97,98,
                                   97,51,49,49,45,54,50,52,102,45,52,100,99,48,
                                   45,56,48,50,98,45,55,48,49,51,56,57,49,55,
                                   56,49,54,99,19,120,45,112,117,107,97,45,100,
                                   101,108,105,118,101,114,121,45,116,97,103,
                                   73,0,0,0,4,7,120,45,100,101,97,116,104,65,0,
                                   0,3,26,70,0,0,1,136,5,99,111,117,110,116,73,
                                   0,0,0,1,8,101,120,99,104,97,110,103,101,83,
                                   0,0,0,0,4,116,105,109,101,84,0,0,0,0,85,137,
                                   43,30,5,113,117,101,117,101,83,0,0,0,150,
                                   116,111,113,45,103,101,110,49,45,95,116,101,
                                   115,116,95,112,97,121,109,101,110,116,95,
                                   114,101,102,117,110,100,95,111,110,95,102,
                                   105,110,97,108,95,100,101,97,116,104,46,112,
                                   121,58,58,116,101,115,116,95,112,117,98,108,
                                   105,115,104,95,111,110,95,102,105,110,97,
                                   108,95,100,101,97,116,104,95,48,48,52,57,48,
                                   101,54,53,45,49,57,98,49,45,52,55,52,53,45,
                                   98,56,49,100,45,99,53,57,100,49,52,52,56,53,
                                   57,54,101,45,57,54,97,98,97,51,49,49,45,54,
                                   50,52,102,45,52,100,99,48,45,56,48,50,98,45,
                                   55,48,49,51,56,57,49,55,56,49,54,99,6,114,
                                   101,97,115,111,110,83,0,0,0,7,101,120,112,
                                   105,114,101,100,12,114,111,117,116,105,110,
                                   103,45,107,101,121,115,65,0,0,0,155,83,0,0,
                                   0,150,116,111,113,45,103,101,110,49,45,95,
                                   116,101,115,116,95,112,97,121,109,101,110,
                                   116,95,114,101,102,117,110,100,95,111,110,
                                   95,102,105,110,97,108,95,100,101,97,116,104,
                                   46,112,121,58,58,116,101,115,116,95,112,117,
                                   98,108,105,115,104,95,111,110,95,102,105,
                                   110,97,108,95,100,101,97,116,104,95,48,48,
                                   52,57,48,101,54,53,45,49,57,98,49,45,52,55,
                                   52,53,45,98,56,49,100,45,99,53,57,100,49,52,
                                   52,56,53,57,54,101,45,57,54,97,98,97,51,49,
                                   49,45,54,50,52,102,45,52,100,99,48,45,56,48,
                                   50,98,45,55,48,49,51,56,57,49,55,56,49,54,
                                   99,70,0,0,1,136,5,99,111,117,110,116,73,0,0,
                                   0,1,8,101,120,99,104,97,110,103,101,83,0,0,
                                   0,0,12,114,111,117,116,105,110,103,45,107,
                                   101,121,115,65,0,0,0,155,83,0,0,0,150,116,
                                   111,113,45,103,101,110,48,45,95,116,101,115,
                                   116,95,112,97,121,109,101,110,116,95,114,
                                   101,102,117,110,100,95,111,110,95,102,105,
                                   110,97,108,95,100,101,97,116,104,46,112,121,
                                   58,58,116,101,115,116,95,112,117,98,108,105,
                                   115,104,95,111,110,95,102,105,110,97,108,95,
                                   100,101,97,116,104,95,48,48,52,57,48,101,54,
                                   53,45,49,57,98,49,45,52,55,52,53,45,98,56,
                                   49,100,45,99,53,57,100,49,52,52,56,53,57,54,
                                   101,45,57,54,97,98,97,51,49,49,45,54,50,52,
                                   102,45,52,100,99,48,45,56,48,50,98,45,55,48,
                                   49,51,56,57,49,55,56,49,54,99,5,113,117,101,
                                   117,101,83,0,0,0,150,116,111,113,45,103,101,
                                   110,48,45,95,116,101,115,116,95,112,97,121,
                                   109,101,110,116,95,114,101,102,117,110,100,
                                   95,111,110,95,102,105,110,97,108,95,100,101,
                                   97,116,104,46,112,121,58,58,116,101,115,116,
                                   95,112,117,98,108,105,115,104,95,111,110,95,
                                   102,105,110,97,108,95,100,101,97,116,104,95,
                                   48,48,52,57,48,101,54,53,45,49,57,98,49,45,
                                   52,55,52,53,45,98,56,49,100,45,99,53,57,100,
                                   49,52,52,56,53,57,54,101,45,57,54,97,98,97,
                                   51,49,49,45,54,50,52,102,45,52,100,99,48,45,
                                   56,48,50,98,45,55,48,49,51,56,57,49,55,56,
                                   49,54,99,6,114,101,97,115,111,110,83,0,0,0,
                                   7,101,120,112,105,114,101,100,4,116,105,109,
                                   101,84,0,0,0,0,85,137,43,26,14,120,45,111,
                                   114,105,103,105,110,45,113,117,101,117,101,
                                   83,0,0,0,104,95,116,101,115,116,95,112,97,
                                   121,109,101,110,116,95,114,101,102,117,110,
                                   100,95,111,110,95,102,105,110,97,108,95,100,
                                   101,97,116,104,46,112,121,58,58,116,101,115,
                                   116,95,112,117,98,108,105,115,104,95,111,
                                   110,95,102,105,110,97,108,95,100,101,97,116,
                                   104,95,48,48,52,57,48,101,54,53,45,49,57,98,
                                   49,45,52,55,52,53,45,98,56,49,100,45,99,53,
                                   57,100,49,52,52,56,53,57,54,101>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<"REDACTED_MESSAGE_BODY">>]},
                                <<156,89,98,190,170,144,189,166,217,107,135,
                                  227,40,191,240,59>>,
                                false},
                               false,false,false,false,queue_index,
                               {message_properties,1435052839418192,false,
                                55}}]}},
                           1,
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/queues/BB5UQYHDZNVEPMDAZY48D2OBL",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,65536,
                            #Fun<rabbit_variable_queue.2.117761292>,
                            #Fun<rabbit_variable_queue.3.48316793>,
                            {0,nil},
                            {0,nil}},
                           {undefined,
                            {client_msstate,msg_store_transient,
                             <<185,208,160,140,133,214,105,177,135,249,236,77,
                               148,196,41,229>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,286792,
                              "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/msg_store_transient",
                             <0.393.0>,290889,282695,294986,299083}},
                           false,0,1,55,0,0,0,infinity,1,0,0,55,0,0,
                           {rates,0.1294337302079736,0.0,0.0,0.0,
                            {1435,52831,419020}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,0,0}},
                         {state,
                          {queue,[],[],0},
                          {inactive,1435052830418220,9010,1.0}},
                         10000,undefined,undefined,
                         {erlang,#Ref<0.0.0.3748>},
                         {state,fine,5000,undefined},
                         {0,nil},
                         9000,
                         {erlang,#Ref<0.0.0.3756>},
                         1435052839418192,
                         {state,
                          {dict,1,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],
                             [[<0.639.0>|#Ref<0.0.0.3755>]],
                             [],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                          delegate},
                         {resource,<<"/">>,exchange,<<"REDACTED_EXCHANGE.dlx">>},
                         <<"REDACTED_QUEUE_NAME">>,
                         undefined,undefined,1,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,'-group_by_queue_and_reason/1-fun-0-',3,[]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
     {rabbit_dead_letter,group_by_queue_and_reason,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]}]}

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
Restarting crashed queue 'toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c' in vhost '/'.
videlalvaro commented 9 years ago

Thanks, I think I've found the reason, here:

https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L148 https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L156

We assume that the <<"count">> field is long therefore the case clause failure.

michaelklishin commented 9 years ago

Can you please provide a script that reproduces the issue?

On 7/7/2015, at 15:24, Alvaro Videla notifications@github.com wrote:

Thanks, I think I've found the reason, here:

https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L148 https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L156

We assume that the <<"count">> field is long therefore the case clause failure.

— Reply to this email directly or view it on GitHub.

tiredpixel commented 9 years ago

I can confirm this issue, which is also happening for us after upgrading from RabbitMQ 3.5.0 to 3.5.3 on Ubuntu Server 14.04 LTS, running in AWS EC2. Initially, I was running dist Erlang R16B03, but now I'm running Erlang R18. Interaction with RabbitMQ is using Ruby, through library Bunny 1.7.0, (with library amq-protocol 1.9.2).

Our logic is similar to that described by @riyad, where messages are published through a dead-letter exchange potentially multiple times. I reinstalled our broker yesterday, and all was running fine until this afternoon, when errors were reported in RabbitMQ logs, after which CPU immediately started increasing fairly steadily, until the Erlang beam.smp process reported almost maxing out the CPUs. Messages being published or consumed are currently very low, around 1 message per 5 minutes. Number of messages in all queues is low, under 20 total. We're using durable queues and persistent messages. Memory is low, far below the watermark.

Peace, tiredpixel


Status of node 'rabbit@i-x.example.com' ...
[{pid,596},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.5.3"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.5.3"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.5.3"},
      {rabbit,"RabbitMQ","3.5.3"},
      {webmachine,"webmachine","1.10.3-rmq3.5.3-gite9359c7"},
      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.5.3-git680dba8"},
      {amqp_client,"RabbitMQ AMQP Client","3.5.3"},
      {xmerl,"XML parser","1.3.8"},
      {mnesia,"MNESIA  CXC 138 12","4.13"},
      {inets,"INETS  CXC 138 49","6.0"},
      {os_mon,"CPO  CXC 138 46","2.4"},
      {sasl,"SASL  CXC 138 11","2.5"},
      {stdlib,"ERTS  CXC 138 10","2.5"},
      {kernel,"ERTS  CXC 138 10","4.0"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:2:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,
     [{total,383717416},
      {connection_readers,1325456},
      {connection_writers,7694952},
      {connection_channels,125132600},
      {connection_other,85150160},
      {queue_procs,19523888},
      {queue_slave_procs,0},
      {plugins,20670368},
      {other_proc,7088664},
      {mnesia,1589408},
      {mgmt_db,34545264},
      {msg_index,450864},
      {other_ets,3566880},
      {binary,44084160},
      {code,20075250},
      {atom,785313},
      {other_system,12034189}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,6416846028},
 {disk_free_limit,50000000},
 {disk_free,1036529143808},
 {file_descriptors,
     [{total_limit,16284},
      {total_used,133},
      {sockets_limit,14653},
      {sockets_used,99}]},
 {processes,[{limit,1048576},{used,26879}]},
 {run_queue,1193},
 {uptime,87068}]

Cluster status of node 'rabbit@i-x.example.com' ...
[{nodes,[{disc,['rabbit@i-x.example.com']}]},
 {running_nodes,['rabbit@i-x.example.com']},
 {cluster_name,<<"rabbit@i-x.example.com">>},
 {partitions,[]}]

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.755.0> terminating
** Last message in was {drop_expired,1}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"x_messenger">>,queue,
                           <<"ha.x_messenger.app_x_queue_error">>},
                          true,false,none,
                          [{<<"x-dead-letter-exchange">>,longstr,
                            <<"ha.x_messenger.app_x_queue">>},
                           {<<"x-message-ttl">>,signedint,300000}],
                          <0.755.0>,[],[],[],undefined,[],[],live},
                         none,false,rabbit_priority_queue,
                         {passthrough,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {2,
                            {[{msg_status,16,
                               <<145,236,191,185,197,127,102,104,95,63,130,119,
                                 35,220,135,98>>,
                               {basic_message,
                                {resource,<<"x_messenger">>,exchange,
                                 <<"ha.x_messenger.app_x_queue_error">>},
                                [<<"ha.x_messenger.app_x_queue">>],
                                {content,60,
                                 {'P_basic',<<"application/octet-stream">>,
                                  undefined,
                                  [{<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"queue">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"time">>,timestamp,1436273833},
                                       {<<"exchange">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"ha.x_messenger.app_x_queue">>}]}]}]}],
                                  2,0,undefined,undefined,undefined,
                                  <<"79d42774-1346-4bc9-a08d-3ba45c61a79a">>,
                                  undefined,undefined,undefined,undefined,
                                  undefined},
                                 <<184,128,24,97,112,112,108,105,99,97,116,105,
                                   111,110,47,111,99,116,101,116,45,115,116,
                                   114,101,97,109,0,0,1,25,7,120,45,100,101,97,
                                   116,104,65,0,0,1,12,70,0,0,1,7,5,99,111,117,
                                   110,116,73,0,0,0,1,6,114,101,97,115,111,110,
                                   83,0,0,0,7,101,120,112,105,114,101,100,5,
                                   113,117,101,117,101,83,0,0,0,59,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                                                      99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,95,101,114,114,111,114,4,116,
                                   105,109,101,84,0,0,0,0,85,155,204,169,8,101,
                                   120,99,104,97,110,103,101,83,0,0,0,59,104,
                                   97,46,100,99,95,109,101,115,115,101,110,103,
                                   101,114,95,112,114,111,100,46,118,49,46,116,
                                   114,97,110,115,102,111,114,109,101,114,95,
                                   100,99,95,97,114,99,104,105,118,101,115,95,
                                   112,97,114,116,115,95,101,114,114,111,114,
                                   12,114,111,117,116,105,110,103,45,107,101,
                                   121,115,65,0,0,0,58,83,0,0,0,53,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,2,0,36,55,57,100,52,50,55,55,
                                   52,45,49,51,52,54,45,52,98,99,57,45,97,48,
                                   56,100,45,51,98,97,52,53,99,54,49,97,55,57,
                                   97>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<133,162,116,114,171,99,114,97,119,108,95,
                                    115,116,101,112,115,166,116,114,95,115,
                                    101,116,131,166,112,114,101,102,105,120,
                                    169,99,114,97,119,108,46,51,54,54,174,
                                    116,114,97,110,115,102,111,114,109,101,
                                    100,95,97,116,183,50,48,49,53,45,48,55,
                                    45,48,55,32,48,50,58,53,56,58,51,48,32,
                                    85,84,67,172,116,114,97,110,115,102,111,
                                    114,109,95,105,100,210,138,95,31,211,166,
                                    105,110,112,117,116,115,129,168,99,114,
                                    97,119,108,95,105,100,205,1,110,164,112,
                                    97,114,116,132,164,105,100,95,102,184,53,
                                    53,57,97,98,48,56,57,54,57,50,100,51,53,
                                    51,51,97,49,101,56,48,52,48,48,164,105,
                                    100,95,116,184,53,53,57,97,98,49,101,51,
                                    54,57,50,100,51,53,51,51,99,100,56,52,48,
                                    54,48,48,164,105,100,95,111,205,35,40,
                                    164,105,100,95,105,10,172,99,97,108,108,
                                    98,97,99,107,95,107,101,121,131,168,112,
                                    105,101,99,101,95,105,100,205,15,183,177,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,105,100,205,2,12,181,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,115,101,116,95,105,
                                    100,205,1,6>>]},
                                <<145,236,191,185,197,127,102,104,95,63,130,
                                  119,35,220,135,98>>,
                                true},
                               true,false,false,true,queue_index,
                               {message_properties,1436279640057485,true,
                                270}}],
                             [{msg_status,15,
                               <<168,84,203,115,201,232,154,241,51,33,92,99,
                                 248,170,24,198>>,
                               {basic_message,
                                {resource,<<"x_messenger">>,exchange,
                                                                 <<"ha.x_messenger.app_x_queue_error">>},
                                [<<"ha.x_messenger.app_x_queue">>],
                                {content,60,
                                 {'P_basic',<<"application/octet-stream">>,
                                  undefined,
                                  [{<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"queue">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"time">>,timestamp,1436238931},
                                       {<<"exchange">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"ha.x_messenger.app_x_queue">>}]}]}]}],
                                  2,0,undefined,undefined,undefined,
                                  <<"788e03ca-672a-4c46-8133-6a2fc2c4f4f4">>,
                                  undefined,undefined,undefined,undefined,
                                  undefined},
                                 <<184,128,24,97,112,112,108,105,99,97,116,105,
                                   111,110,47,111,99,116,101,116,45,115,116,
                                   114,101,97,109,0,0,1,25,7,120,45,100,101,97,
                                   116,104,65,0,0,1,12,70,0,0,1,7,5,99,111,117,
                                   110,116,73,0,0,0,1,6,114,101,97,115,111,110,
                                   83,0,0,0,7,101,120,112,105,114,101,100,5,
                                   113,117,101,117,101,83,0,0,0,59,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,95,101,114,114,111,114,4,116,
                                   105,109,101,84,0,0,0,0,85,155,68,83,8,101,
                                   120,99,104,97,110,103,101,83,0,0,0,59,104,
                                   97,46,100,99,95,109,101,115,115,101,110,103,
                                   101,114,95,112,114,111,100,46,118,49,46,116,
                                   114,97,110,115,102,111,114,109,101,114,95,
                                   100,99,95,97,114,99,104,105,118,101,115,95,
                                   112,97,114,116,115,95,101,114,114,111,114,
                                   12,114,111,117,116,105,110,103,45,107,101,
                                   121,115,65,0,0,0,58,83,0,0,0,53,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,2,0,36,55,56,56,101,48,51,99,
                                   97,45,54,55,50,97,45,52,99,52,54,45,56,49,
                                   51,51,45,54,97,50,102,99,50,99,52,102,52,
                                   102,52>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<133,162,116,114,171,99,114,97,119,108,95,
                                    115,116,101,112,115,166,116,114,95,115,
                                    101,116,131,166,112,114,101,102,105,120,
                                    169,99,114,97,119,108,46,51,54,54,174,
                                    116,114,97,110,115,102,111,114,109,101,
                                                                        100,95,97,116,183,50,48,49,53,45,48,55,
                                    45,48,55,32,48,50,58,53,56,58,51,48,32,
                                    85,84,67,172,116,114,97,110,115,102,111,
                                    114,109,95,105,100,210,138,95,31,211,166,
                                    105,110,112,117,116,115,129,168,99,114,
                                    97,119,108,95,105,100,205,1,110,164,112,
                                    97,114,116,132,164,105,100,95,102,184,53,
                                    53,57,97,98,53,102,53,54,57,50,100,51,53,
                                    51,53,48,49,54,53,48,55,48,48,164,105,
                                    100,95,116,184,53,53,57,97,98,55,51,57,
                                    54,57,50,100,51,57,54,50,49,98,55,48,48,
                                    54,48,48,164,105,100,95,111,205,50,200,
                                    164,105,100,95,105,14,172,99,97,108,108,
                                    98,97,99,107,95,107,101,121,131,168,112,
                                    105,101,99,101,95,105,100,205,15,187,177,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,105,100,205,2,12,181,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,115,101,116,95,105,
                                    100,205,1,6>>]},
                                <<168,84,203,115,201,232,154,241,51,33,92,99,
                                  248,170,24,198>>,
                                true},
                               true,false,false,true,queue_index,
                               {message_properties,1436279449181213,true,
                                270}}]}},
                           17,
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/queues/69RXCRA0YQFLNESM4YE1XHF0G",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             [{segment,0,
                               "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/queues/69RXCRA0YQFLNESM4YE1XHF0G/0.idx",
                               {array,16384,0,undefined,100000},
                               2}]},
                            #Ref<0.0.1572866.70861>,0,65536,
                            #Fun<rabbit_variable_queue.2.117761292>,
                            #Fun<rabbit_variable_queue.3.48316793>,
                            {0,nil},
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,
                             <<13,24,107,90,238,140,22,42,193,193,247,117,63,
                               25,47,49>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,225357,
                              "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_persistent"},
                                                           rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_persistent",
                             <0.240.0>,229454,221260,233551,237648},
                            {client_msstate,msg_store_transient,
                             <<101,74,79,173,74,7,174,242,180,223,118,182,139,
                               198,89,135>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,204872,
                              "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_transient",
                             <0.233.0>,208969,200775,213066,217163}},
                           true,0,2,540,0,2,540,infinity,2,0,0,540,0,0,
                           {rates,0.005228732197132713,
                            1.7370371540135685e-303,1.7370371540135685e-303,
                            1.7370371540135685e-303,
                            {1436,279341,778502}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,0,17}},
                         {state,
                          {queue,[],[],0},
                          {inactive,1436238631386029,41942114405,1.0}},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         300000,
                         {erlang,#Ref<0.0.3932161.80107>},
                         1436279449181213,
                         {state,
                          {dict,2,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],
                             [[<0.3455.6>|#Ref<0.0.3932161.80106>]],
                             [],[],[],[],[],[],[],
                             [[<0.3479.6>|#Ref<0.0.3932161.86474>]],
                             [],[],[]}}},
                          delegate},
                         {resource,<<"x_messenger">>,exchange,
                          <<"ha.x_messenger.app_x_queue">>},
                         undefined,undefined,undefined,1,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
          {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7229.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7229.6>,[],[],[],undefined,[],[],crashed},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449185300,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7231.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                                                               <0.7231.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449191151,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7233.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7233.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449196370,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                                                                    delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7235.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7235.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449201433,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7237.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7237.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449206805,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}
michaelklishin commented 9 years ago

We understand what the issue is but not what causes it. Can someone post a small sample that reproduces the issue? We will turn it into a test case.

tiredpixel commented 9 years ago

Sorry, I don't have a sample I'm able to post easily at present. But I can add to the above that it does indeed seem to be rabbit_dead_letter.erl increment_xdeath_event_count case x_death_event_key(Info, <<"count">>, long) of line; when I patched that to disable the attempt to read and increment <<"count">> (which we're not currently using), instead following the undefined branch of the case, the broker freed up and CPU usage went right back down with existing messages being processed. My apologies for not being able to post a test case code sample.

Peace, tiredpixel

michaelklishin commented 9 years ago

@tiredpixel thanks. A more specific question: do your apps set the count header? Or is it only ever touched by RabbitMQ on dead-lettering?

tiredpixel commented 9 years ago

@michaelklishin, our apps don't set or use the count header at all. It's only ever touched by RabbitMQ on dead-lettering. I noted that <<"count">> seems to be signedint at that point, but I wasn't able to spot why it's such rather than a long. As far as I got (before patching RabbitMQ as above) was that this seems to be for messages which have gone through the dead-letter exchange a few times, by which time <<"count">> appears to be already set—I suspect following update_x_death_header Info1 [{table, M}] matches branch being followed, if that helps.

michaelklishin commented 9 years ago

@tiredpixel thanks. We do have tests that result in a message being dead-lettered multiple times. I have a couple of ideas. 3.5.4 won't be released without a fix for this.

riyad commented 9 years ago

You can provoke the crash of the queue with the Python script found in https://gist.github.com/riyad/7439eb545dabf287bcc9. It requires you to install the Puka library and update the AMQP_URL in the script to point to your RabbitMQ.

michaelklishin commented 9 years ago

@riyad perfect, thank you!

tiredpixel commented 9 years ago

@michaelklishin, understood; thank you for looking into it. :) @riyad, thank you for contributing a test case to this investigation. :)

Peace, tiredpixel

michaelklishin commented 9 years ago

I can reproduce the issue. Looking into it.

riyad commented 9 years ago

FYI: I fixed a small issue with the demo script. It was missing a wait() when publishing the first message. :sweat_smile:

see: https://gist.github.com/riyad/7439eb545dabf287bcc9/revisions

michaelklishin commented 9 years ago

Cheers.

michaelklishin commented 9 years ago

@riyad I have a fix, does this output confirm correct execution? https://gist.github.com/michaelklishin/97355747f6b85dfe3aae — the exception is gone.

riyad commented 9 years ago

Basically there should be an additional x-death entry for every time the message is deadlettered. If there is a message in the queue at the end of the script with all the x-deaths everything should be OK. The important part is, the message must not be lost.

michaelklishin commented 9 years ago

I see

rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   3

after 3 runs, so it must be working as expected.

riyad commented 9 years ago

@michaelklishin To be sure it would be nice if you could provide the output of the script with an empty queue.

michaelklishin commented 9 years ago
λ ~/Development/RabbitMQ/debug/ rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   0
λ ~/Development/RabbitMQ/debug/ python server-216.py
connecting ...
setting up ...
publishing ...
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 7, 'routing_key': 'test.issue_216', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 1}, 'redelivered': False, 'delivery_tag': 1}
declaring timeout queue toq-gen0-test.issue_216_crashing_queue
publishing to toq-gen0-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 7, 'routing_key': 'test.issue_216', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 1}, 'redelivered': False, 'delivery_tag': 1}
fetched message: {'cluster_id': '', 'empty': True}
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 11, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 2, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue']}]}, 'redelivered': False, 'delivery_tag': 2}
declaring timeout queue toq-gen1-test.issue_216_crashing_queue
publishing to toq-gen1-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 11, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 2, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue']}]}, 'redelivered': False, 'delivery_tag': 2}
fetched message: {'cluster_id': '', 'empty': True}
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 15, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 3, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen1-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen1-test.issue_216_crashing_queue']}, {'count': 1, 'exchange': '', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue'], 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56)}]}, 'redelivered': False, 'delivery_tag': 3}
declaring timeout queue toq-gen2-test.issue_216_crashing_queue
publishing to toq-gen2-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 15, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 3, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen1-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen1-test.issue_216_crashing_queue']}, {'count': 1, 'exchange': '', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue'], 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56)}]}, 'redelivered': False, 'delivery_tag': 3}
λ ~/Development/RabbitMQ/debug/ rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   1
riyad commented 9 years ago

Looks good :)

michaelklishin commented 9 years ago

So the culprit seems to be the fact that the script modifies headers (see the filter_headers function). That's why our existing test suite didn't catch this and this is client-specific.

riyad commented 9 years ago

Hmm ... so you're saying Puka is to blame? Am I correct that the issue is that header fields were encoded with the "wrong" type when republishing (even when they're superficially the same)?

michaelklishin commented 9 years ago

Not Puka but I could only trigger this with Puka, it serializes integers differently from the Java client (e.g. JVM distinguishes between integers and longs, and many dynamically-typed languages use automatic promotion on overflow).

michaelklishin commented 9 years ago

OK, a decent fix is in #221. We'll release 3.5.4 RC very soon and if new edge cases are found, will add them to Bunny's test suite (one of the client test suites we use).

riyad commented 9 years ago

Thank a lot for fixing this so quickly. :smile:

videlalvaro commented 9 years ago

Saw the fix. I think you can still check if the client sent one of the AMQP's "integer types". See here for an example https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/blob/master/src/rabbit_delayed_message_utils.erl#L39 we do the same when we handle queue arguments inside rabbit_amqqueue I think. On Wed, Jul 8, 2015 at 1:53 PM Michael Klishin notifications@github.com wrote:

Closed #216 https://github.com/rabbitmq/rabbitmq-server/issues/216 via 508d76d https://github.com/rabbitmq/rabbitmq-server/commit/508d76d31984caf91977d2b0fdb095ed829981b3 .

— Reply to this email directly or view it on GitHub https://github.com/rabbitmq/rabbitmq-server/issues/216#event-350694132.

tiredpixel commented 9 years ago

@michaelklishin, oh, such a speedy patch—very much obliged to you! :) I'll look forward to 3.5.4 RC. :)

michaelklishin commented 9 years ago

@videlalvaro that would require extracting validators into rabbit_misc or similar. We are under pressure to release 3.5.4 and the clusterer, so we'll do what you suggest for 3.5.5.

videlalvaro commented 9 years ago

I'll just want to prevent data corruption/future crashes if the user sends some stuff that's not even an int anymore. In that case if we do some math with the count header, then we will have another crash. On Wed, Jul 8, 2015 at 10:21 PM Michael Klishin notifications@github.com wrote:

@videlalvaro https://github.com/videlalvaro that would require extracting validators into rabbit_misc or similar. We are under pressure to release 3.5.4 and the clusterer, so we'll do what you suggest for 3.5.5 .

— Reply to this email directly or view it on GitHub https://github.com/rabbitmq/rabbitmq-server/issues/216#issuecomment-119719379 .

michaelklishin commented 9 years ago

If the user modifies x-death events then it counts as a user error. I'm not sure what we can do if "count" is something other than a numerical value. Ignore it entirely?

videlalvaro commented 9 years ago

Reset to default? On Wed, Jul 8, 2015 at 10:48 PM Michael Klishin notifications@github.com wrote:

If the user modifies x-death events then it counts as a user error. I'm not sure what we can do if "count" is something other than a numerical value. Ignore it entirely?

— Reply to this email directly or view it on GitHub https://github.com/rabbitmq/rabbitmq-server/issues/216#issuecomment-119726906 .

seetharamireddy540 commented 8 years ago

hi, I am using Rabbit MQ 3.5.4 and experiencing the same issues ?

=ERROR REPORT==== 14-Mar-2016::20:04:20 === * Generic server <0.1073.0> terminating * Last message in was {'$gen_cast',init}


                       undefined,undefined,undefined,undefined,0,running}

* Reason for termination == * {{case_clause,{<<"x-death">>,longstr, <<"[{reason=expired,

Can you pls suggest me how i can delete the queue ? I am not able to delete this queue from Admin Console.

michaelklishin commented 8 years ago

restarting the node should make it possible to delete the queue before it hits the same exception again. The right thing to do is to upgrade.

On 15 mar 2016, at 3:41, Ram notifications@github.com wrote:

hi, I am using Rabbit MQ 3.5.4 and experiencing the same issues ?

=ERROR REPORT==== 14-Mar-2016::20:04:20 === * Generic server <0.1073.0> terminating * Last message in was {'$gen_cast',init}

                   undefined,undefined,undefined,undefined,0,running}

* Reason for termination == * {{case_clause,{<<"x-death">>,longstr, <<"[{reason=expired,

Can you pls suggest me how i can delete the queue ? I am not able to delete this queue from Admin Console.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/rabbitmq/rabbitmq-server/issues/216#issuecomment-196640135

pdoreau commented 8 years ago

Hello. I'm using v3.6.5 (also tested from 3.5.6) and retry queues (task => retry1 (dl/ttl) => task => retry2 (dl/ttl) => task => retry3 (dl/ttl) => dl). Dead lettering the 2nd time causes this error :

 =ERROR REPORT==== 14-Nov-2016::11:09:53 ===
 ** Generic server <0.932.0> terminating
 ** Last message in was {drop_expired,1}
...
=CRASH REPORT==== 14-Nov-2016::11:09:53 ===
   crasher:
     initial call: rabbit_prequeue:init/1
     pid: <0.932.0>
     registered_name: []
     exception exit: {function_clause,
                      [{rabbit_dead_letter,
                        '-queue_and_reason_matcher/2-fun-1-',
                        [{array,
                          [{array,[{longstr,<<"l">>},{longstr,<<"1">>}]},
                           {array,[{longstr,<<"S">>},{longstr,<<"expired">>}]},
                           {array,
                            [{longstr,<<"S">>},
                             {longstr,
                              <<"mytask_retry_1">>}]},
                           {array,
                            [{longstr,<<"T">>},{longstr,<<"1479121758">>}]},
                           {array,[{longstr,<<"S">>},{longstr,<<"retry">>}]},
                           {array,
                            [{longstr,<<"A">>},
                             {array,
                              [{longstr,
                                <<"mytask_retry_1">>}]}]}]}],
                        [{file,"src/rabbit_dead_letter.erl"},{line,178}]},
                       {lists,partition,4,[{file,"lists.erl"},{line,1302}]},
                       {rabbit_dead_letter,update_x_death_header,2,
                        [{file,"src/rabbit_dead_letter.erl"},{line,127}]},
                       {rabbit_dead_letter,'-make_msg/5-fun-2-',8,
                        [{file,"src/rabbit_dead_letter.erl"},{line,65}]},
                       {rabbit_basic,map_headers,2,
                        [{file,"src/rabbit_basic.erl"},{line,258}]},
                       {rabbit_dead_letter,make_msg,5,
                        [{file,"src/rabbit_dead_letter.erl"},{line,68}]},
                       {rabbit_dead_letter,publish,5,
                        [{file,"src/rabbit_dead_letter.erl"},{line,34}]},
                       {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',
                        7,
                        [{file,"src/rabbit_amqqueue_process.erl"},
                         {line,848}]}]}
       in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1143)
     ancestors: [<0.931.0>,rabbit_amqqueue_sup_sup,rabbit_sup,<0.140.0>]
     messages: []
     links: [<0.931.0>]
     dictionary: [{{credit_to,<0.1089.0>},49},
                   {{xtype_to_module,direct},rabbit_exchange_type_direct},
                   {process_name,
                       {rabbit_amqqueue_process,
                           {resource,<<"myvhost">>,queue,
                               <<"mytask_retry_2">>}}},
                   {rand_seed,
                       {#{max => 288230376151711743,
                          next => #Fun<rand.8.41921595>,
                          type => exsplus,
                          uniform => #Fun<rand.9.41921595>,
                          uniform_n => #Fun<rand.10.41921595>},
                        [239153173037508270|193202472509615284]}},
                   {guid,{{2457094499,2660964459,2278303215,3311440446},1}},
                   {credit_flow_default_credit,{200,50}}]
     trap_exit: true
     status: running
     heap_size: 2586
     stack_size: 27
     reductions: 4190
   neighbours:
...

Any way to make it work ?

michaelklishin commented 8 years ago

@pdoreau please post an code snippet that reproduces it to rabbitmq-users, our public mailing list

pdoreau commented 8 years ago

@michaelklishin I posted a message on the user list but it doestn't appear. May be not so usefull to post a snippet as I rely on several layers (swarrot, amqplib). However, I noticed nested arrays are given to Amqplib to publish the retry2 message:

...
["x-death"]=>
      array(2) {
        [0]=>
        string(1) "A"
        [1]=>
        array(1) {
          [0]=>
          array(6) {
            ["count"]=>
            array(2) {
              [0]=>
              string(1) "l"
              [1]=>
              string(1) "1"
            }
            ["reason"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(7) "expired"
            }
            ["queue"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(36) "mytask_retry_1"
            }
            ["time"]=>
            array(2) {
              [0]=>
              string(1) "T"
              [1]=>
              string(10) "1479142401"
            }
            ["exchange"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(5) "retry"
            }
            ["routing-keys"]=>
            array(2) {
              [0]=>
              string(1) "A"
              [1]=>
              array(1) {
                [0]=>
                string(36) "mytask_retry_1"
              }
            }
          }
        }
      }
...

Can that be the cause ?

michaelklishin commented 8 years ago

They were in the moderation queue, I approved them.

michaelklishin commented 8 years ago

The x-death value is an array of objects (in JSON terms), so we need to compare what your app is sending. It is certainly possible for apps to confuse RabbitMQ by sending invalid modified headers, we have seen this several times before.

michaelklishin commented 8 years ago

Here's the rabbitmq-users thread and so far it looks like the client or one of the libraries on top unintentionally modifies the x-death header which is meant to be modified only by RabbitMQ itself. In any case, this is not the same issue.