noxdafox / rabbitmq-message-deduplication

RabbitMQ Plugin for filtering message duplicates
Mozilla Public License 2.0
277 stars 34 forks source link

RabbitMQ w/ dedup plugin enabled appears to hang on start_app after 'reset' #62

Closed JohnPolansky closed 3 years ago

JohnPolansky commented 4 years ago

First off thanks for the great plugins we've started using it in our project!

Platform: OSX RabbitMQ: 3.8.6 Dedup/Elixr: 4.5

However I've noticed an issue that I thought I would report here to see if you can fix it as I'm pretty sure it's related to your plugin. Sometimes for troubleshooting purposes we like to issue a 'rabbitmqctl reset' and reimport the config so we can start with a clean Rabbit. This has been working well for awhile, but recently we started using your dedup plugin. Now it appears if we execute the following steps, RabbitMQ will get hung on the 'start_app' action, the CPU and memory start spiking and the rabbit server never completes startup I left it for 15 mins and according to Activity Monitor it was using 100% CPU and was up to 140 GB of memory: image

The rabbit server was basically idle, nothing was connected and no work being done.. I've reproduced it several times so hopefully you won't have an issue.

STR:

  1. Start RabbitMQ normally w/ management and dedup plugin active
  2. Execute the following:
    rabbitmqadmin export /tmp/rabbit_settings.json
    rabbitmqctl stop_app
    rabbitmqctl reset
    rabbitmqctl start_app
    rabbitmqadmin import /tmp/rabbit_settings.json
  3. the 'start_app' will hang in the logs you can see it get stuck at the message:

2020-08-11 22:20:19.362 [info] <0.2900.0> Making sure data directory '/usr/local/var/lib/rabbitmq/mnesia/rabbit@localhost/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists

  1. You can wait and watch the CPU and Memory spike, but eventually u'll need to kill the rabbit process. If you try and start it up again it will proceed to get stuck again.
  2. To resolve I've found that rabbitmq-plugins enable rabbitmq_message_deduplication (while it's hung) will stop the plugin from restarting, you'll still have to kill rabbit and restart for it to come up cleanly.
  3. I'm guessing the plugin is expecting some data to be there that the reset is removing?

I've attached a log to the ticket.

A workaround seems to be to "disable" the plugin before you do the reset and then enable it afterwards. rabbitmq-dedup-hang.log

etc_rabbitmq_config.zip

noxdafox commented 3 years ago

The issue can be reproduced by simply running:

rabbitmqctl stop_app
rabbitmqctl start_app

There is no need to reset the node.

What is puzzling me is that the broker bootstrap hangs way before reaching the plugin initialization routines. In other words, none of the plugin logic seems to have been executed yet at the time of the hanging.

noxdafox commented 3 years ago

Broker restart support added in v0.5.0.

Please keep in mind that for single node setups, restarting the broker will result in all deduplication caches being cleared. This means deduplication exchanges/queues won't be able to deduplicate messages seen prior to the broker restart.

This issue does not persist on multi node setups as long as all the nodes are not restarted at once.

Please re-open this ticket if the issue persists.