integr8ly / application-monitoring-operator

Operator for installing the Application Monitoring Stack on OpenShift (Prometheus, AlertManager, Grafana)
Apache License 2.0
30 stars 44 forks source link

INTLY-1100 Make inhibition rules more explicit for CronJob alerts #28

Closed aidenkeating closed 5 years ago

aidenkeating commented 5 years ago

Currently the inhibition rule for CronJob alerts are vague and encompass other alerts also. We only want to address the issue where JobRunningTimeExceeded has a warning and a critical alert for the same job.

This change restricts the inhibit_rule to just address alerts from JobRunningTimeExceeded.

Verification:

aidenkeating commented 5 years ago

After the Change

All criticals are shown, the warnings are inhibited.

The warning without a critical is shown.

Screen Shot 2019-03-20 at 15 07 42

aidenkeating commented 5 years ago

@david-martin @pb82 I'm going to provision these changes onto my cluster and do the first half of the verification steps, as the verification steps are quite long

aidenkeating commented 5 years ago

This is now provisioned on the akeating-4ecc cluster.

I've set one CronJob to debug mode (enmasse-pv-backup), so the Job will never end to kick off severity=critical. I've set the rest of the CronJobs to suspended.

So the verification steps remaining would be: