Closed dcorbacho closed 4 years ago
Picking this one up now.
Given 1k queues on rmq2, scrape duration is 2-3s. Running rabbitmqctl eval 'application:set_env(rabbitmq_prometheus, enable_metric_aggregation, true).'
on rmq2 brings it down to 250ms, similar to what the other nodes are doing:
RabbitMQ-Overview dashboard is not affected by these changes.
I will test RabbitMQ-Quorum-Queues-Raft tomorrow - I expect a few changes needed there.
I will increase the number of queues all the way to 80k and see if this still holds. The last phase is to increase the number of connections & channels to 80k each and see if this optimisations holds.
I am picking this one up again, deploying 50k queues, 50k connections & 50k channels.
Tested on:
gcloud compute instances create-with-container tgir-s01e01-gerhard-rmq1-server \
--public-dns --boot-disk-type=pd-ssd --labels=namespace=rabbitmq-prometheus-28-gerhard --container-stdin --container-tty \
--machine-type=n1-standard-32 \
--create-disk=name=rabbitmq-prometheus-28-gerhard-rmq1-server-persistent,size=200GB,type=pd-ssd,auto-delete=yes \
--container-mount-disk=name=rabbitmq-prometheus-28-gerhard-rmq1-server-persistent,mount-path=/var/lib/rabbitmq \
--container-env RABBITMQ_ERLANG_COOKIE=rabbitmq-prometheus-28-gerhard \
--container-image=pivotalrabbitmq/rabbitmq-prometheus:3.9.0-alpha.203-2020.02.03
with curl -s -o /dev/null -w '%{http_code} time_total:%{time_total} size_bytes:%{size_download}\n' http://34.89.10.130:15692/metrics
50k queues with & without metric aggregation enabled:
200 time_total:60.395880 size_bytes:62322787
200 time_total:1.023527 size_bytes:347759
When I had 50k connections on top of the 50k queues the metrics would timeout after 60s:
000 time_total:60.105977 size_bytes:0
With metric aggregation enabled & then with -H "Accept-Encoding: gzip"
200 time_total:1.705673 size_bytes:348528
200 time_total:1.607329 size_bytes:21323
:shipit:
@dcorbacho can we pair-up on this tomorrow? https://github.com/rabbitmq/rabbitmq-prometheus/commit/5caa4198b17099d87df5e7ce5faa0b8ae6edd42d
FWIW, https://github.com/rabbitmq/rabbitmq-prometheus/commit/378da2f7c32a03712e5f6b2181e102bce3c402a3 enables metrics aggregation by default. The reasoning is captured in the README. This is the follow-up commit https://github.com/rabbitmq/rabbitmq-prometheus/commit/8b0c7c4f4e01ad0d7d2f39ec478add059da5c112.
prometheus.return_per_object_metrics = false
Closes #26, see #24 and #25 for the background.