noxdafox / rabbitmq-message-deduplication

RabbitMQ Plugin for filtering message duplicates
Mozilla Public License 2.0
271 stars 33 forks source link

Crash on startup rabbitmq >=3.8.10 Boot failed #76

Closed Patrick-Remy closed 2 years ago

Patrick-Remy commented 2 years ago

After upgrading my rabbitmq instance from 3.8.9 to 3.8.10 (or later), it crashes on startup with following log messages: I am using the latest v0.5.0 of this plugin.

rabbitmq_1                           | BOOT FAILED
rabbitmq_1                           | ===========
rabbitmq_1                           | Error during startup: {error,
rabbitmq_1                           |                        {rabbitmq_message_deduplication,
rabbitmq_1                           |                         {bad_return,
rabbitmq_1                           |                          {{'Elixir.RabbitMQMessageDeduplication',start,
rabbitmq_1                           |                            [normal,[]]},
rabbitmq_1                           |                           {'EXIT',
rabbitmq_1                           |                            {{badmatch,
rabbitmq_1                           |                              {error,{vhost_supervisor_not_running,<<"/">>}}},
rabbitmq_1                           |                             [{rabbit_amqqueue_sup_sup,find_for_vhost,2,
rabbitmq_1                           |                               [{file,"src/rabbit_amqqueue_sup_sup.erl"},
rabbitmq_1                           |                                {line,50}]},
rabbitmq_1                           |                              {'Elixir.RabbitMQMessageDeduplication.Queue',
rabbitmq_1                           |                               find_queue_supervisor,2,
rabbitmq_1                           |                               [{file,
rabbitmq_1                           |                                 "lib/rabbitmq_message_deduplication/rabbit_message_deduplication_queue.ex"},
rabbitmq_1                           |                                {line,529}]},
rabbitmq_1                           |                              {'Elixir.RabbitMQMessageDeduplication.Queue',
rabbitmq_1                           |                               restart_queue_process,1,
rabbitmq_1                           |                               [{file,
rabbitmq_1                           |                                 "lib/rabbitmq_message_deduplication/rabbit_message_deduplication_queue.ex"},
rabbitmq_1                           |                                {line,518}]},
rabbitmq_1                           |                              {'Elixir.Enum','-map/2-lists^map/1-0-',2,
rabbitmq_1                           |                               [{file,"lib/enum.ex"},{line,1396}]},
rabbitmq_1                           |                              {'Elixir.RabbitMQMessageDeduplication',start,2,
rabbitmq_1                           |                               [{file,"lib/rabbitmq_message_deduplication.ex"},
rabbitmq_1                           |                                {line,24}]},
rabbitmq_1                           |                              {application_master,start_it_old,4,
rabbitmq_1                           |                               [{file,"application_master.erl"},
rabbitmq_1                           |                                {line,277}]}]}}}}}}
rabbitmq_1                           | 2021-08-13 15:32:53.115 [error] <0.889.0> 

Dockerfile

FROM rabbitmq:3.8.10-management-alpine

ARG PLUGIN_ELIXIR_VERSION=1.10.4
ARG PLUGIN_DEDUPLICATION_VERSION=0.5.0
ARG PLUGIN_DELAYED_MESSAGES_VERSION=3.8.17

# Install and enable plugins
RUN wget -P $RABBITMQ_HOME/plugins/ https://github.com/noxdafox/rabbitmq-message-deduplication/releases/download/${PLUGIN_DEDUPLICATION_VERSION}/elixir-${PLUGIN_ELIXIR_VERSION}.ez \
    && wget -P $RABBITMQ_HOME/plugins/ https://github.com/noxdafox/rabbitmq-message-deduplication/releases/download/${PLUGIN_DEDUPLICATION_VERSION}/rabbitmq_message_deduplication-${PLUGIN_DEDUPLICATION_VERSION}.ez \
    && wget -P $RABBITMQ_HOME/plugins/ https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/download/${PLUGIN_DELAYED_MESSAGES_VERSION}/rabbitmq_delayed_message_exchange-${PLUGIN_DELAYED_MESSAGES_VERSION}.8f537ac.ez \
    && chown -R rabbitmq:rabbitmq $RABBITMQ_HOME/plugins/ \
    && rabbitmq-plugins enable rabbitmq_management rabbitmq_message_deduplication rabbitmq_delayed_message_exchange
noxdafox commented 2 years ago

Hello,

I just tested the provided Dockerfile and everything seems to work fine.


  ##  ##      RabbitMQ 3.8.10
  ##  ##
  ##########  Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Doc guides: https://rabbitmq.com/documentation.html
  Support:    https://rabbitmq.com/contact.html
  Tutorials:  https://rabbitmq.com/getstarted.html
  Monitoring: https://rabbitmq.com/monitoring.html

  ...

2021-08-15 20:45:52.673 [info] <0.745.0> Server startup complete; 6 plugins started.
 * rabbitmq_prometheus
 * rabbitmq_delayed_message_exchange
 * rabbitmq_message_deduplication
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * rabbitmq_management_agent

Same with 3.8.9 and 3.8.12. Broker boots fine and I can create deduplication exchanges and queues.

Patrick-Remy commented 2 years ago

It seems that it only crashes, if I enable also the plugin delayed_message_exchange, with a configured exchange:

[
   'type' => 'x-delayed-message',
    'arguments' => [
        'x-delayed-type' => 'x-message-deduplication',
        'x-cache-size' => 10000
    ]
],
noxdafox commented 2 years ago

I am afraid the above is not well supported as it expects the two plugins to be started in the correct sequence. RabbitMQ has been recently changing the plugins start order and this might be the core problem leading to the issue.

I would recommend you to create a delayed exchange and a deduplication one separately and then bind them one to the other. In this way, you can achieve the same result without risking to suffer from plugins started in the wrong order.

You can see some discussions related to the plugins boot order messup in here: https://groups.google.com/g/rabbitmq-users/c/m7jgGmIk4oY/m/4tmLsVM0AwAJ

noxdafox commented 2 years ago

The commit you might be affected from is likely the following: https://github.com/rabbitmq/rabbitmq-server/commit/a0cd2e5fd033a7490e97ee46ea42b7b3581c3629

Patrick-Remy commented 2 years ago

Thanks for your fast investigation, and response. I'll have to try your suggestion!

noxdafox commented 2 years ago

I put together a build which should be solving your issue. Would you mind giving it a try? You will need Erlang 24.

plugins.zip

Patrick-Remy commented 2 years ago

Yes, it works perfectly (for RabbitMQ >=3.8.16)! Many thanks for that!

Patrick-Remy commented 2 years ago

Is there a roadmap for the next bugfix release, that includes this fix?

Patrick-Remy commented 2 years ago

Any updates for a next release build?

noxdafox commented 2 years ago

Hello,

I am very busy atm. I will have more time from the beginning of November. You can expect the next release around mid-November.

noxdafox commented 2 years ago

Release 0.5.1 fixes this issue. Please re-open this ticket otherwise.

Patrick-Remy commented 2 years ago

Again I am unfortunately not sure if it's related to this plugin, sorry :/
But with the new setup the „auto migration” by just recreating the container works well. But if I manually stop one node and try to start it again, rabbitmq now starts consuming multiple GB of RAM (like going in an endless loop). I posted the full debug log here:

https://pastebin.com/pfj3XfNB MPCVbDbfT6

Dockerfile

FROM rabbitmq:3.9.9-management-alpine AS production

RUN apk --no-cache add tzdata

# See https://github.com/noxdafox/rabbitmq-message-deduplication/releases
ARG PLUGIN_ELIXIR_VERSION=1.12.2
ARG PLUGIN_DEDUPLICATION_VERSION=0.5.1
# See https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases
ARG PLUGIN_DELAYED_MESSAGES_VERSION=3.9.0

# Install and enable plugins
RUN wget -P $RABBITMQ_HOME/plugins/ https://github.com/noxdafox/rabbitmq-message-deduplication/releases/download/${PLUGIN_DEDUPLICATION_VERSION}/elixir-${PLUGIN_ELIXIR_VERSION}.ez \
    && wget -P $RABBITMQ_HOME/plugins/ https://github.com/noxdafox/rabbitmq-message-deduplication/releases/download/${PLUGIN_DEDUPLICATION_VERSION}/rabbitmq_message_deduplication-${PLUGIN_DEDUPLICATION_VERSION}.ez \
    && wget -P $RABBITMQ_HOME/plugins/ https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/download/${PLUGIN_DELAYED_MESSAGES_VERSION}/rabbitmq_delayed_message_exchange-${PLUGIN_DELAYED_MESSAGES_VERSION}.ez \
    && chown -R rabbitmq:rabbitmq $RABBITMQ_HOME/plugins/ \
    && rabbitmq-plugins enable rabbitmq_management rabbitmq_message_deduplication rabbitmq_delayed_message_exchange

I executed

/ # rabbitmqctl stop_app
Stopping rabbit application on node rabbit@worker1-rabbitmq ...

/ # rabbitmqctl join_cluster rabbit@core-rabbitmq
Clustering node rabbit@worker1-rabbitmq with rabbit@core-rabbitmq
The node is already a member of this cluster

/ # rabbitmqctl start_app
Starting node rabbit@worker1-rabbitmq ...

# After some seconds and consuming many GB of RAM..

Error: unable to perform an operation on node 'rabbit@worker1-rabbitmq'. Please see diagnostics information and suggestions below.

Most common reasons for this are:
 * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
 * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
 * Target node is not running

In addition to the diagnostics info below:
 * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
 * Consult server logs on node rabbit@worker1-rabbitmq
 * If target node is configured to use long node names, don't forget to use --longnames with CLI tools

DIAGNOSTICS
===========
attempted to contact: ['rabbit@worker1-rabbitmq']
rabbit@worker1-rabbitmq:
  * connected to epmd (port 4369) on worker1-rabbitmq
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on worker1-rabbitmq
  * suggestion: start the node

Current node details:
 * node name: 'rabbitmqcli-307-rabbit@worker1-rabbitmq'
 * effective user's home directory: /var/lib/rabbitmq
 * Erlang cookie hash: xxxxxxxx

/ # ERROR: 137

I would be really happy if you could check if this comes from the deduplication plugin, I see a bad_return_value in context of the deduplication boot step (full log at pastebin, link above)

rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [info] <0.929.0> Deduplication queues enabled, real BQ is rabbit_priority_queue                                                                                                                                                                                                                               
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [info] <0.929.0>                                                                                                                                                                                                                                                                                              
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>     logger: removed_failing_filter                                                                                                                                                                                                                                                           
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>     filter: {rmqlog_filter,{#Fun<rabbit_prelaunch_logging.14.111361164>,                             
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                             #{global => debug,upgrade => none}}}                                                                                                                                                                                                                             
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>     owner: rmq_2_stdout                                                                                                                                                                                                                                                                      
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>     log_event: #{level => info,                                                                                                                                                                                                                  
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                  meta =>                                                                                                                                                                                                                                                                     
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                      #{domain => [rabbitmq],                                                                                                                                                                                                                                                 
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                        gl => <0.928.0>,pid => <0.929.0>,                                                                                                                                                                                                                                     
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                        time => 1636716868517122},                                                                                                                                                                                                                                            
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                  msg =>                                                                                                                                                                                                                                                                      
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                      {<<"Deduplication queues enabled, real BQ is ~s~n">>,                                                                                                                                                                                                                   
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                       [rabbit_priority_queue]}}                                                                                                                          
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>     reason: {bad_return_value,#{level => info,                                                                                                                                                                                                                                               
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                 meta =>                                                                                                                                                                                                                                                      
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                     #{domain => [rabbitmq],                                                                                                                                                                                                                                 
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                       gl => <0.928.0>,pid => <0.929.0>,                                                                                                                                                                             
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                       time => 1636716868517122},                                                                                                                                                                                                                             
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                 msg =>                                                                                                                                                                                                                                                       
rabbitmq_1      | 2021-11-12 12:34:28.517122+01:00 [dbug] <0.929.0>                                     {<<"Deduplication queues enabled, real BQ is ~s~n">>,                                                          

Thanks for your great support and this awesome plugin! :)