influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.32k stars 492 forks source link

Ability to define custom handlers #1692

Open ahsanali opened 6 years ago

ahsanali commented 6 years ago

The topics adds a nice interface to decouple alerting logic from tasks but it would be cool to have ability to have custom handlers in topics. For example one use case is Opsgenie. We have a opsgenie handler in kapacitor but it has an issue i-e when an alert goes from warning to critical opsgenie doesn't show that because both ciritical and warning alerts has the same 'alias' which are deduplicated by opsgenie deduplication engine.

If we have ability to define custom handlers we can then define our own logic in them. Currently we are forced to maintain our own fork of kapacitor which isn't nice for obvious reasons.

phemmer commented 6 years ago

A bit of a hack, but we have a somewhat related issue with pagerduty that you might be able to use. The idea is to use separate alert nodes, and once you do this you can generate an "id" which is different for warnings & criticals. E.G:

var data = foo
data|alert()
  .warn(lambda: warning_condition AND !(critical_condition))
  .id('{{ .Name }}/{{ .Group }}/warning')
data|alert()
  .crit(lambda: critical_condition)
  .id('{{ .Name }}/{{ .Group }}/critical')

This does make the alerting section of the script a bit verbose, so dunno if this would work well for you. For us our scripts are stored as jinja templates, so the alert definition is abstracted away to a single line template function call.

nathanielc commented 6 years ago

@ahsanali @phemmer I don't quite follow the issue. Why does the change in level need a different alert ID?

@ahsanali Can you share the patch you have applied to Kapacitor? That will help me understand what it is you are after. Thanks.

ranjithruban commented 6 years ago

@nathanielc the issue we had is opsgenie with alert deduplication combines both warning and critical alert with same alias, we miss the critical /warning alert depending on which one happens later. As a result we added alias + previous level to alerts and tries to close the previous alert. The patch added is here patch

with new opsgenie 1.5 this patch will stop working. Looks like pagerduty also have same problem from @phemmer update