rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
884 stars 273 forks source link

reset erlang scheduler cpu bind type to "unbound" #1517

Closed chideat closed 10 months ago

chideat commented 10 months ago

Is your feature request related to a problem? Please describe.

We used 7 machines each with 128 cores and 320GB to set up a new production environment. In performing the RabbitMQ performance validation, we deployed a single replica of rabbitmqcluster with 8 cores and 8GB on one of the nodes. With 10 quorum queues and 30 producers, the incoming messages could reach 60-70K/s.

We then deployed another single replica of rabbitmqcluster with the same configuration on that node and stress-tested all instances together, but the incoming rate for the quorum queues remained at 60-70K/s.

After deploying 5 more single replicas of rabbitmqcluster and conducting a stress test on all instances together, the total production/consumption was only around 60-70K/s, showing no improvement.

Through multiple rounds of testing, we discovered that the CPU utilization of cores 0-7 was consistently >90%, while all other CPU cores remained <10%.

Describe the solution you'd like

We set erlang +stbt config to unbound and redo stress test, the load spread around all the cores, and the total production came to 270K/s,though the throughput was not very stable.

Config as below:

spec:
  rabbitmq:
    envConfig: |
      RABBITMQ_SCHEDULER_BIND_TYPE="u"

Additional context

In Kubernetes, it is quite common to deploy multiple RabbitMQ pods on the same node. If the default value of +stbt is kept as "db" (the actual value is 'tnnps'), it can significantly limit the overall performance of RabbitMQ.

Besides, the unbound config is not listed in rabbitmq doc, which i think shoud be added too.

Doc refer: