grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.06k stars 516 forks source link

Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

Open grafanabot opened 3 years ago

grafanabot commented 3 years ago

Describe the bug To be able to use Thanos BucketStore while supporting Cortex multi-tenancy we need to create a BucketStore for each tenant, passing a dedicated Prometheus registry to each one and then aggregate metrics from all registries.

Due to this, the Prometheus metrics collection causes high memory allocations (order of 50MB/s in a store-gateway with 7.5K tenants). Allocated memory is not retained, but still puts pressure on GC.

Screenshot 2021-01-07 at 11 35 27

In a cluster with low QPS, 95% store-gateway memory allocations are caused by metrics collecting.

Submitted by: pracucci Cortex Issue Number: 3697

grafanabot commented 3 years ago

Enabling shuffle-sharding on store-gateway significantly improve this.

Submitted by: pracucci

pracucci commented 3 years ago

More data points from a store-gateway loading blocks from 13k tenants.

CPU

Screenshot 2021-08-11 at 17 05 17

Memory allocations (bytes)

Screenshot 2021-08-11 at 17 06 08