Closed 7840vz closed 2 weeks ago
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
commenting to keep this open a little longer, looks like a genuine issue
@7840vz would you have a moment to address the conflicts?
resolved conflicts
could you check this one as well pls? https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/942
It would be nice to have also the cluster indicated in the alert description (here, and here):
description: 'A client certificate used to authenticate to kubernetes apiserver is expiring in less than %s on cluster {{ $labels.%s }}.' % [(utils.humanizeSeconds($._config.certExpirationWarningSeconds)), $._config.clusterLabel],
@lorenzofelletti Having the cluster
labels in the description is definitely desirable but not everyone has the cluster
label, so might need some logic in the descriptions to check if they have multiCluster
enabled or not (I think some alerts do this iirc). I think that can be addressed as a separate issue as there's probably a whole bunch of alerts that would benefit from that.
1) Fix aggregation for
on(job)
to become(job, cluster, instance)
. Otherwise, It would be enough to have just single instance with certificate expiration problem, and it would set all apiservers to 'firing' (false positive!).2) Also, change aggregation
by (le)
towithout(service,endpoint...)
, dropping only useless labels, but keeping external labels (like environment etc) intact. Otherwise they get dropped.3) Change order of metrics in expression:
apiserver_client_certificate_expiration_seconds_bucket
metric comes first so actual expiration date is shown as result in Grafana->Explore queries, notapiserver_client_certificate_expiration_seconds_count
value (which is quite useless). This make it easier to troubleshoot.