Open 7840vz opened 2 months ago
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
commenting to keep this open a little longer, looks like a genuine issue
1) Fix aggregation for
on(job)
to become(job, cluster, instance)
. Otherwise, It would be enough to have just single instance with certificate expiration problem, and it would set all apiservers to 'firing' (false positive!).2) Also, change aggregation
by (le)
towithout(service,endpoint...)
, dropping only useless labels, but keeping external labels (like environment etc) intact. Otherwise they get dropped.3) Change order of metrics in expression:
apiserver_client_certificate_expiration_seconds_bucket
metric comes first so actual expiration date is shown as result in Grafana->Explore queries, notapiserver_client_certificate_expiration_seconds_count
value (which is quite useless). This make it easier to troubleshoot.