Closed Adiqq closed 6 years ago
Is there any profiling or CPU context switching data that suggests that rabbitmqctl status
is the main contributor as opposed to, say, a non-optimal runtime scheduler-to-core binding strategy? We would really prefer to not guess.
What would a "sane" value look like? Also, where should such notes go?
I would be interested to know how long the "enormous CPU usage (70%) " lasted - I assume it was brief. Also, it would be good to know your cluster size, RabbitMQ version, and Erlang version.
Running the liveness probe every 60 seconds sounds more reasonable than every 10. I refrained from using rabbitmqctl node_health_check
since a failing probe will lead to a node restart, which is too much e.g. for a node in a resource alarm state (and unlikely to help in the medium term).
Will add some remarks that this is just an example and should be treated as such.
@lukebakken I left rabbitmq cluster for few hours, it was constant usage at ~70% CPU, at the beginning I thought it was bug or issue with alpine rabbitmq image, but disabling probes fixed problem and now it uses ~1% CPU. I used https://hub.docker.com/_/rabbitmq/ , rabbitmq:3.7.5-management-alpine with 3 rabbitmq pods.
The high cpu is due to this issue https://github.com/bitnami/bitnami-docker-rabbitmq/pull/63 However that PR does not fully resolve the issue as the liveness probes do not inherit the ulimit.
This plugin has nothing to do with OS limits. It just performs peer discovery.
Hi,
There are probes in example:
In our cluster, these commands were executed every 10 seconds which led to enormous CPU usage (70%) for rabbitmq cluster without any workload. It would be nice to mention that this command is so heavyweight and provide some sane
periodSeconds
for example.