Open stevesg opened 2 years ago
I understand the use case, but I think using multi-zone is an overcomplicated solution, which will also make autoscaling more complicated.
I think what you want is just a preferred (not required) anti-affinity rule based on the node topology key, so that if the pod can be scheduled on a node where no other querier is running, then it will, otherwise it will be scheduled on a node where another querier is already running.
Is your feature request related to a problem? Please describe.
Using
querier_allow_multiple_replicas_on_same_node
can be useful for increasing node utilization, but introduces the risk that in the case that all queriers fit on a single node. If that node becomes unresponsive, the read path is down until pods are moved to different nodes.Describe the solution you'd like
querier
andruler-querier
This will allow better node utilization, whilst preserving availability of queriers.
Arguably the same logic should apply to
query-frontend
andquery-scheduler
.