stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Configure prometheus alert to to github issue #1

Open rbo opened 3 years ago

rbo commented 3 years ago

For example:

github-actions[bot] commented 3 years ago

Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.

rbo commented 3 years ago

Tested both:

alertmanager-to-github

alertmanager-github-receiver

rbo commented 3 years ago

alertmanager-github-receiver prepared:

Container Image: quay.io/stormshift/alertmanager-github-receiver:master GitHub Repo: https://github.com/stormshift/alertmanager-github-receiver

rbo commented 3 years ago

Tried alertmanager-github-receiver, it create for every alert an new github issue and I hit the github rate limit:

2021/06/17 13:22:24 Failed to list open github issues: GET https://api.github.com/search/issues?q=is:issue+in:title+is:open+org:rbo+label:%22prometheus%2Falert%22: 403 API rate limit of 10 still exceeded until 2021-06-17 13:23:20 +0000 UTC, not making remote request. [rate reset in 55s]
rbo commented 3 years ago

Tried alertmanager-github-receiver, it create for every alert an new github issue and I hit the github rate limit:

2021/06/17 13:22:24 Failed to list open github issues: GET https://api.github.com/search/issues?q=is:issue+in:title+is:open+org:rbo+label:%22prometheus%2Falert%22: 403 API rate limit of 10 still exceeded until 2021-06-17 13:23:20 +0000 UTC, not making remote request. [rate reset in 55s]

I have changed the group_interval & repeat_interval to very low limits, which might cause to hit the rate limit!

Default values:

group_interval: 5m 
repeat_interval: 12h

Settings from operate first: https://github.com/operate-first/apps/blob/master/kfdefs/base/monitoring/alertmanager-config.yaml Deployment from operate first: https://github.com/operate-first/apps/tree/master/alertreceiver

rbo commented 3 years ago

The group_by is important: https://github.com/operate-first/apps/blob/0ffc7519991f0507bceedc88046da1fba816c87b/cluster-scope/overlays/prod/moc/common/alertmanager-main-secret.yaml#L80 to have a propper title

github-actions[bot] commented 3 years ago

Heads up @cluster/rhacm-admin - the "cluster/rhacm" label was applied to this issue.

rbo commented 3 years ago

Out Stormshift api account is locked by github. Support ticket created.

View or update your ticket here: Ticket 1231019

rbo commented 3 years ago

Stormshift api account is NOT locked anymore.

DanielFroehlich commented 2 years ago

I like the idea, but I am also afraid that we get swamped by issues. Can we get a Label "AutoGenerated", so we can easily filter and delete them?

And: What is missing to set this up, e.g. in OCP4 to test and lear about?

DanielFroehlich commented 2 years ago

Ah, see it is set up in RHACM, but something is failing: "Alertmanager openshift-monitoring/alertmanager-main-1 failed to send 100% of notifications to webhook."

@rbo Would you mind either roll this forward and make it run, or roll this back and remove, so we can get rid of the warning?