grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.14k stars 532 forks source link

Jsonnet: Support deploying queriers in multiple zones #2294

Open stevesg opened 2 years ago

stevesg commented 2 years ago

Is your feature request related to a problem? Please describe.

Using querier_allow_multiple_replicas_on_same_node can be useful for increasing node utilization, but introduces the risk that in the case that all queriers fit on a single node. If that node becomes unresponsive, the read path is down until pods are moved to different nodes.

Describe the solution you'd like

This will allow better node utilization, whilst preserving availability of queriers.

Arguably the same logic should apply to query-frontend and query-scheduler.

pracucci commented 2 years ago

I understand the use case, but I think using multi-zone is an overcomplicated solution, which will also make autoscaling more complicated.

I think what you want is just a preferred (not required) anti-affinity rule based on the node topology key, so that if the pod can be scheduled on a node where no other querier is running, then it will, otherwise it will be scheduled on a node where another querier is already running.