xchem / xchem_it

Issues for XChem IT work
0 stars 0 forks source link

Setup Prometheus alerts #2

Open tdudgeon opened 3 years ago

tdudgeon commented 3 years ago

Setup alerts to notify us when there are problems with the clusters such as nodes or services failing.

Needs #1 to be complete first.

tdudgeon commented 3 years ago

Basic cluster alerts have been set up for the dev cluster. They are being sent to Slack. We'll monitor this for a few days. So far no alerts!

tdudgeon commented 3 years ago

As prometheus needed to be re-installed (see #1) the alerting needed to be set up again in the dev cluster. The same slack channel is used. Currently only the built in alerts are used.

tdudgeon commented 3 years ago

The new prometheus setup (on dev cluster) seems to be firing lots of false alters. I'm not sending to the slack channel as there would be too much noise.

I asked on the Rancher slack channel but got no response so I raise an issue against the repo. https://github.com/rancher/charts/issues/1191