rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.04k stars 3.9k forks source link

Erratic CPU utilization profile with a large number of mirrored queues #1713

Closed gerhard closed 3 years ago

gerhard commented 5 years ago

Given the following RabbitMQ deployment:

CPU utilisation is unpredictable when queues are mirrored

Given 10k idle queues mirrored across all nodes in a 3-node GCP n1-highcpu-8 cluster, RabbitMQ nodes utilise CPU (user + system) in an unpredictable and arbitrary way:

dstat --cpu 60 1

CPU (60s avg) rmq0 (initial Master) rmq1 (Slave1) rmq2 (Slave2)
user 35% 62% 29%
system 18% 8% 18%
idle 48% 30% 53%

This is what happens to CPU utilisation (user + system) when rmq 1 (initial Master) gets stopped via rabbitmqctl shutdown:

CPU (60s avg) rmq0 (stopped) rmq1 (promoted Master) rmq2 (Slave1)
user 0% 64% 30%
system 0% 7% 19%
idle 100% 29% 50%

And this is what happens when the rmq1 (promoted Master) gets stopped via rabbitmqctl shutdown:

CPU (60s avg) rmq0 (stopped) rmq1 (stopped) rmq2 (promoted Master)
user 0% 0% 31%
system 0% 0% 18%
idle 100% 100% 51%

We stop & restart the only node running rmq2 (restarted Master), and notice significantly lower CPU utilisation (user + system) on this host (6% vs 49%):

CPU (60s avg) rmq0 (stopped) rmq1 (stopped) rmq2 (restarted Master)
user 0% 0% 4%
system 0% 0% 2%
idle 100% 100% 94%

After rmq1 (new Slave1) gets started, we observe same CPU utilisation (user + system) on this host as before the stop (71% vs 70%):

CPU (60s avg) rmq0 (stopped) rmq1 (new Slave1) rmq2 (Master)
user 0% 64% 4%
system 0% 7% 2%
idle 100% 29% 94%

After rmq0 (new Slave2) gets started we observe lower CPU utilisation (user + system) on this host (5% vs 53%):

CPU (60s avg) rmq0 (new Slave 2) rmq1 (Slave1) rmq2 (Master)
user 3% 64% 5%
system 2% 7% 2%
idle 95% 29% 93%

After all nodes are restarted, we observe the same CPU utilisation across all hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 3% 64% 5%
system 2% 7% 2%
idle 95% 29% 93%

After all nodes are restarted for the second time, we observe high CPU utilisation on all hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 32% 64% 33%
system 20% 8% 18%
idle 48% 28% 48%

After all nodes are restarted for the third time, we observe low CPU utilisation on 2 out of 3 hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 4% 64% 10%
system 2% 7% 5%
idle 94% 29% 85%

If we continue restarting all nodes sufficient times, they will all end up utilising 5-7% CPU (user + system), with no other changes.

In other words, starting & stopping all nodes sufficient times was observed to reduce the CPU utilisation from 50-70% to ~5-7%.

Does +sbwt none make a difference?

Yes, it does (1-3% vs 5-7% CPU utilisation) but some nodes exhibit high & unpredictable CPU utilisation when restarted.

With +sbwt none set and all nodes restarted for the first time:

CPU (60s avg) rmq0 (restarted Master) rmq1 (restarted Slave2) rmq2 (restarted Slave1)
user 10% 48% 4%
system 2% 3% 4%
idle 88% 50% 92%

rmq1 (restarted Slave2) msacc output per thread type with system time

rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  6.24%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 93.76%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 5.2%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.95%( 0.0%)  0.66%( 0.0%)  1.22%( 0.1%)  0.24%( 0.0%)  0.20%( 0.0%)  2.12%( 0.1%)  0.06%( 0.0%)  0.07%( 0.0%)  1.36%( 0.1%)  0.00%( 0.0%)  2.44%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 89.63%( 4.6%)  1.03%( 0.1%)

With +sbwt none set and all nodes restarted for the second time, CPU utilisation (user + system) drops to 1% on the slaves & stays at 27% on rmq0 (restarted Master):

CPU (60s avg) rmq0 (restarted Master) rmq1 (restarted Slave2) rmq2 (restarted Slave1)
user 18% 1% 1%
system 9% 0% 0%
idle 73% 99% 99%

rmq0 (restarted Master) msacc output per thread type with system time:

rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.04%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  4.66%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 95.30%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 15.65%( 0.8%)  0.00%( 0.0%)  0.35%( 0.0%)  0.08%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 83.93%( 4.3%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.89%( 0.0%)  0.57%( 0.0%)  1.51%( 0.1%)  0.18%( 0.0%)  0.17%( 0.0%)  2.69%( 0.1%)  0.65%( 0.0%)  0.07%( 0.0%)  1.22%( 0.1%)  0.00%( 0.0%)  2.04%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 88.99%( 4.6%)  1.01%( 0.1%)

With +sbwt none set and only rmq0 (restarted Slave2) restarted, CPU utilisation (user + system) drops to 1-3%:

CPU (60s avg) rmq0 (restarted Slave2) rmq1 (restarted Master) rmq2 (restarted Slave1)
user 1% 2% 1%
system 1% 1% 0%
idle 98% 97% 99%
rabbitmqctl -n rabbit@rmq0-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq0-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

rabbitmqctl -n rabbit@rmq1-gm-gc list_queues pid slave_pids
<rabbit@rmq1-gm-gc.1.21866.0>   [<rabbit@rmq2-gm-gc.1.32123.1>, <rabbit@rmq0-gm-gc.3.4944.1>]
<rabbit@rmq1-gm-gc.1.21911.0>   [<rabbit@rmq2-gm-gc.1.32133.1>, <rabbit@rmq0-gm-gc.3.23329.0>]
...

rabbitmqctl -n rabbit@rmq2-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq2-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

Other observations

Open questions

twillouer commented 5 years ago

Hi, I do some test on my computer, with a debian stretch, rabbitmq 3.6.15 and rabbitmq 3.7.7 (with erlang 21.0.9), with no other node, only ha policy, and cpu is flying with 5k queues (for open questions n°5)

lukebakken commented 5 years ago

It would be interesting to see if the amount of inter-node traffic is directly related to this.

kitsirota commented 5 years ago

Have you tried using something like https://github.com/Ayanda-D/rabbitmq-queue-master-balancer to confirm your queue masters arent concentrated on any one node?

It would also be interesting to get the numbers for OTP 21.0 for comparison.

gerhard commented 5 years ago

@twillouer thank you for confirming that the inconsistent high CPU utilisation is present even when there is a single node in the cluster.

@lukebakken inter-node traffic was always the same in my observations, regardless how hard the CPUs were working

@kitsirota all queue masters are always concentrated on a single node, high CPU utilisation is not linked to queue master location. The question is why do nodes have unpredictable CPU utilisation, regardless whether they run slave or master queue processes?

lukebakken commented 5 years ago

Thanks @gerhard that's an interesting fact.

vikinghawk commented 5 years ago

Are there any updates on this investigation or is it on hold for other priorities?

michaelklishin commented 5 years ago

It’s on pause and likely will remain so until the end of the month.

On 2 Nov 2018, at 00:19, Michael Dent notifications@github.com wrote:

Are there any updates on this investigation or is it on hold for other priorities?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

twillouer commented 5 years ago

Hi,

any updates on this issue ?

michaelklishin commented 5 years ago

The update is that our team is busy with other things.

Over 80% (in one case, 95%) of time reported by microstate accounting is in the Sleep column. That's certainly something to investigate. I'd use rabbitmq-top to find top processes by the number of reductions. If there are no obvious outliers this is likely something non-deterministic in the runtime.

OTP 21.2 change log mentions an issue where "dirty schedulers could stay awake forever".

@essen have you seen something like this? Any advice as to how this can be narrowed down further, besides looking for reductions outliers?

JochenABC commented 5 years ago

We have reproduced this on a single node RabbitMQ 3.7.11 running on Erlang/OTP 21 [erts-10.2.3] on Ubuntu 18.04. It only takes a few seconds to do:

Step 1: Setup rabbitmq

We used these instructions to install on a fresh vm: https://computingforgeeks.com/how-to-install-latest-rabbitmq-server-on-ubuntu-18-04-lts/

Step 2: Create 5.000 queues

Install python with pika lib:

apt install -y python-pip
pip install pika

Run this python-script by running python scriptname

#!/usr/bin/env python
import pika

rabbitmq_host = "127.0.0.1"
rabbitmq_port = 5672
rabbitmq_virtual_host = "/"

credentials = pika.PlainCredentials("admin", "admin")
connection = pika.BlockingConnection(
    pika.ConnectionParameters(rabbitmq_host, rabbitmq_port, rabbitmq_virtual_host, credentials)
)
channel = connection.channel()

for i in range(0, 5000):
    channel.queue_declare(queue="test{}".format(i))

It will create 5.000 queues. It assumes that there is a user named "admin" with password "admin".

Creating the queues will take some cpu. However cpu usage goes down again after this.

Step 3: Create ha-policy

Create ha policy by running:

rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all","ha-sync-mode":"manual"}'

CPU usage will go up and stay high - although we would expect it to go down since there are no other cluster members / no messages in the queues

You can now drop the policy again by running:

rabbitmqctl clear_policy ha-all

CPU usage will settle back to almost 0.

Observations

Here is rabbitmq_top output in high cpu state after enabling the ha policy: image

The is rabbitmq_top output after creating the queues but NOT enabling the ha policy yet:

image

VM graph when enabling the policy (red line):

image

We are affected by a high cpu issue with replication in a production cluster with about 2.5k queues and we believe that this issue could be related. Any help or advice on what to try or to run would be highly appreciated.

(Also happy to provide an already set up test-vm if that makes reproduction easier)

root@jo-rabbit:/etc# rabbitmqctl status
Status of node rabbit@jo-rabbit ...
[{pid,24412},
 {running_applications,
     [{rabbitmq_top,"RabbitMQ Top","3.7.11"},
      {rabbitmq_management,"RabbitMQ Management Console","3.7.11"},
      {amqp_client,"RabbitMQ AMQP Client","3.7.11"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.11"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.11"},
      {rabbit,"RabbitMQ","3.7.11"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.7.11"},
      {cowboy,"Small, fast, modern HTTP server.","2.6.1"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.7.1"},
      {ssl,"Erlang/OTP SSL application","9.1.2"},
      {public_key,"Public key infrastructure","1.6.4"},
      {asn1,"The Erlang ASN1 compiler version 5.0.8","5.0.8"},
      {mnesia,"MNESIA  CXC 138 12","4.15.5"},
      {os_mon,"CPO  CXC 138 46","2.4.7"},
      {sysmon_handler,"Rate-limiting system_monitor event handler","1.1.0"},
      {cowlib,"Support library for manipulating Web protocols.","2.7.0"},
      {crypto,"CRYPTO","4.4"},
      {xmerl,"XML parser","1.3.19"},
      {inets,"INETS  CXC 138 49","7.0.5"},
      {recon,"Diagnostic tools for production use","2.3.6"},
      {jsx,"a streaming, evented json parsing toolkit","2.9.0"},
      {lager,"Erlang logging framework","3.6.5"},
      {goldrush,"Erlang event stream processor","0.1.9"},
      {compiler,"ERTS  CXC 138 10","7.3.1"},
      {syntax_tools,"Syntax tools","2.1.6"},
      {sasl,"SASL  CXC 138 11","3.3"},
      {stdlib,"ERTS  CXC 138 10","3.7"},
      {kernel,"ERTS  CXC 138 10","6.2"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 21 [erts-10.2.3] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:64]\n"},
 {memory,
     [{connection_readers,0},
      {connection_writers,0},
      {connection_channels,0},
      {connection_other,24404},
      {queue_procs,118434472},
      {queue_slave_procs,0},
      {plugins,110977900},
      {other_proc,35293028},
      {metrics,14281436},
      {mgmt_db,31176656},
      {mnesia,8983920},
      {other_ets,8499904},
      {binary,3249896},
      {msg_index,28672},
      {code,27492004},
      {atom,1172689},
      {other_system,14421723},
      {allocated_unused,55224096},
      {reserved_unallocated,18190336},
      {strategy,rss},
      {total,[{erlang,374036704},{rss,447451136},{allocated,429260800}]}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]},
 {vm_memory_calculation_strategy,rss},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,3266020966},
 {disk_free_limit,50000000},
 {disk_free,75307692032},
 {file_descriptors,
     [{total_limit,32668},
      {total_used,2},
      {sockets_limit,29399},
      {sockets_used,0}]},
 {processes,[{limit,1048576},{used,20386}]},
 {run_queue,1},
 {uptime,434},
 {kernel,{net_ticktime,60}}]
michaelklishin commented 5 years ago

Thanks for the metrics. We already have a way to reproduce. We will not be providing any advice here, this is what the mailing list is for.

michaelklishin commented 5 years ago

Actually maybe we will 😆. The biggest contributor on both screenshots above is the queue metrics collector and the node has 2 CPU cores available to it. With 5K queue replicas on every node this is not particularly surprising to see.

The original environment used by @gerhard has several times more cores and fewer queues, and a lot less obvious metrics from msacc and other tools.

michaelklishin commented 5 years ago

After discussing this with someone there are two major differences between the original report by @gerhard and the one by @JochenABC which I will cover below. In order to avoid this being a honeypot for all kinds of "my node uses more CPU resources than I expected" kind of question I've locked the issue.

In the original report

In the 2nd report

While there can be some overlap (e.g. mirrors use more resources than they should), there is nothing particularly surprising in the 2nd report but there is a whole lot of things that we don't fully understand (having tried a few known runtime scheduler settings).

Any further feedback is welcome on the mailing list.

michaelklishin commented 5 years ago

Here's a rabbitmq-users thread for the 2nd question. TL;DR: switching to +sbwt none (that's an Erlang VM flag) made a major difference.

michaelklishin commented 3 years ago

Closing because this hasn't come up with 3.8 (with quorum queues or classic mirrored ones) and 3.7 is out of support.