Open bjorns163 opened 4 months ago
Additionally the kube_statefulset_replicas shows up as none sometimes.
mimir-ingester-zone-a = none
mimir-ingester-zone-b = none
mimir-ingester-zone-c = none
mimir-store-gateway-zone-a = none
mimir-store-gateway-zone-b = none
mimir-store-gateway-zone-c = none
Describe the bug
It seems that the monitoring alerts MimirRolloutStuck sometimes gets triggered incorrectly. The runbook included shows the steps to investigate and there are no issues.
Here are the outputs:
Here the statefulset of the store-gateway:
Here the statefulset of the ingester:
Yet the alerts are being triggered:
If I dig into the metrics and check all values, kube_statefulset_status_update_revision seems to be the issue. The two affected statefulset's have a value of None and not 1, what seems to trigger the alert.
Looking into the issue a bit, I found a this topic explaining this is down to the update strategy being on delete. it doesn't always happen that the value is none but from time to time after patching the cluster or updating values for the statefulset make the value be None.
Expected behavior
No alert in this case.
Environment