grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.18k stars 537 forks source link

Idea: retry label values and names requests in the query-frontend #10037

Open dimitarvdimitrov opened 4 days ago

dimitarvdimitrov commented 4 days ago

What is the problem you are trying to solve?

A transient failure for a label names or values query is directly returned to the user.

Which solution do you envision (roughly)?

Extend the middleware machinery so that we can reuse the retry middleware on labels requests too. Currently it can't be directly reused because middlewares are only working with MetricsQueryRequest.

Maybe we should make retry generic so that it works with LabelsQueryRequest too.

Have you considered any alternatives?

Implement a retry round-tripper and wrap the labels round-tripper. This will duplicate code and will require us to keep the two implementations in sync. I think we were also trying to steer away from round-trippers. See https://github.com/grafana/mimir/pull/7536, https://github.com/grafana/mimir-squad/issues/1938

Any additional context to share?

No response

How long do you think this would take to be developed?

Small (<= 1 month dev)

What are the documentation dependencies?

No response

Proposer?

No response

francoposa commented 2 hours ago

related: https://github.com/grafana/mimir-squad/issues/2625

Whether things end up being RoundTrippers or middlewares, I think we will keep getting tripped up if we do not have a vision for how our query frontend layers each want to support varying subsets of the request types (remote read, range, instant, labels, cardinality, active series, active native histogram series, series, metadata, and query exemplars).