grafana / cortex-jsonnet

Deprecated: see https://github.com/grafana/mimir/tree/main/operations/mimir instead
Apache License 2.0
74 stars 53 forks source link

Added CortexKVStoreFailure alert #406

Closed pracucci closed 2 years ago

pracucci commented 2 years ago

What this PR does: In this PR I propose to add CortexKVStoreFailure  alert. The idea is to get an alert in case a Cortex instance is failing to talk to Consul (eg. Consul is down or there's a network partitioning).

It's a warning alert to get more confidence, but idea is to move it to a critical alert if turnes out to work fine.

Which issue(s) this PR fixes: N/A

Checklist

pracucci commented 2 years ago

LGTM. Should we add similar ones for Etcd? Or preferably similar alert for any KV backend.

@pstibrany Definitely. I've update the alert to use the generic metric we have for KV store operations. WDYT?