openshift / openshift-tools

A public repository of scripts used by OpenShift Operations for various purposes
Apache License 2.0
163 stars 206 forks source link

Adding KubeAPILatency alerts for v3 #4331

Closed ravitri closed 4 years ago

ravitri commented 4 years ago

Adding KubeAPILatency alerts as per https://issues.redhat.com/browse/OSD-3037.

For now, adding the alerts with "warn" and "avg" severities which would be reviewed for clusters for some time and then appropriate severity will be set.

Right now, the logic is following:

  1. WARN - Average API Latency for 99 percentile pods with threshold value greater than 1000ms (1 second) for last 10 minutes.
  2. AVG - Average API Latency for 90 percentile pods with threshold value greater than 1000ms (1 second) for last 20 minutes.

TODO: Intention is to identify genuine API latency alerts which are actionable by SRE. The severity will be adjusted after review of above alerts for a month.

SOP: https://github.com/openshift/ops-sop/blob/master/v3/alerts/openshift_master.asciidoc#210-openshift-master-api-call-latency

openshift-ops-bot commented 4 years ago

All tests passed!

gsleeman commented 4 years ago

/lgtm