influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 493 forks source link

sensu alerts format #1460

Open lcaflc opened 7 years ago

lcaflc commented 7 years ago

Hi,

I'm working on integration between kapacitor and Sensu and it seems I've triggered a bug in alerts format check. I've seen in #240 that you check for output to match expected sensu format before submit the request to it. However since introduction of #1299 you add a handler id to the alert ID. And this does not match the regex:

[chronograf-v1-80112a77-e1eb-4c6c-a41b-81b3d96b3ca5:alert3] 2017/06/29 17:24:10 E! failed to send event to Sensu invalid name "load:nil" for sensu alert. Must match ^[\w\.-]+$

Is this a bug or a missunderstanding from me.

Thanks.

nathanielc commented 7 years ago

@lcaflc Can you share your relevant configuration? The TICKscript, handler definition, sensu config section?

How are you defining the handler to be load:nil?

lcaflc commented 7 years ago

Here is my relevant kapacitor.conf, using version 1.3.1-1 on Ubuntu Xenial:

[sensu]
  # Configure Sensu.
  enabled = true
  # The Sensu Client host:port address.
  addr = "sensu-master-1.net.kosc:3030"
  # Default JIT source.
  source = "Kapacitor"

The Tickscript is the following:

ID: chronograf-v1-80112a77-e1eb-4c6c-a41b-81b3d96b3ca5
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 29 Jun 17 17:23 CEST
Modified: 29 Jun 17 17:44 CEST
LastEnabled: 29 Jun 17 17:44 CEST
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
var db = 'telegraf'

var rp = 'autogen'

var measurement = 'system'

var groupBy = ['host']

var whereFilter = lambda: TRUE

var name = 'load'

var idVar = name + ':{{.Group}}'

var message = ''

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'chronograf'

var outputRP = 'autogen'

var outputMeasurement = 'alerts'

var triggerType = 'threshold'

var crit = 0.2

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |eval(lambda: "load1")
        .as('value')

var trigger = data
    |alert()
        .crit(lambda: "value" > crit)
        .stateChangesOnly()
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .sensu()

trigger
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')

DOT:
digraph chronograf-v1-80112a77-e1eb-4c6c-a41b-81b3d96b3ca5 {
graph [throughput="2.00 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="150"];

from1 [avg_exec_time_ns="9.955µs" errors="0" working_cardinality="0" ];
from1 -> eval2 [processed="150"];

eval2 [avg_exec_time_ns="177.604µs" errors="75" working_cardinality="2" ];
eval2 -> alert3 [processed="75"];

alert3 [alerts_triggered="3" avg_exec_time_ns="1.963793ms" crits_triggered="1" errors="0" infos_triggered="0" oks_triggered="2" warns_triggered="0" working_cardinality="2" ];
alert3 -> http_out5 [processed="3"];
alert3 -> influxdb_out4 [processed="3"];

http_out5 [avg_exec_time_ns="0s" errors="0" working_cardinality="1" ];

influxdb_out4 [avg_exec_time_ns="0s" errors="0" points_written="3" working_cardinality="0" write_errors="0" ];
}

I don't have any specific handler definition. anyway the message does not go up to sensu at the moment.

lcaflc commented 7 years ago

With this exact TICKScript I have this error:

E! failed to send event to Sensu invalid name "load:host=testbastion2" for sensu alert. Must match ^[\w\.-]+$

I got a handler value but still not match the regex.

lcaflc commented 7 years ago

Hi all,

I've done some more search on this and I've just simplify the most possible the problem and is what I have.

My tickscript is the following:

stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('cpu')
    |alert()
        .crit(lambda: "usage_idle" <  10)
        // send to sensu.
        .sensu()

This is the sample one with just a sensu send instead of logs. And here is the error I have with this:

[cpu_alert:alert2] 2017/07/06 15:41:33 D! CRITICAL alert triggered id:cpu:nil msg:cpu:nil is CRITICAL data:&{cpu map[host:testbastion cpu:cpu0] [time usage_guest usage_guest_nice usage_idle usage_iowait usage_irq usage_nice usage_softirq usage_steal usage_system usage_user] [[2017-07-06 13:41:33 +0000 UTC 0 0 0 0 0 0 0 0 0.09990009989991609 99.90009990000695]]}
[cpu_alert:alert2] 2017/07/06 15:41:33 E! failed to send event to Sensu invalid name "cpu:nil" for sensu alert. Must match ^[\w\.-]+$

And my sensu configuration as simple as it can be:

[sensu]
  # Configure Sensu.
  enabled = true
  # The Sensu Client host:port address.
  addr = "XXXXX:3030"
  # Default JIT source.
  source = "Kapacitor"
lcaflc commented 7 years ago

Fixed my issue by replacing the default ID format in the TickScript:

stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('cpu')
    |alert()
        .crit(lambda: "usage_idle" <  10)
        // Whenever we get an alert write it to a file.
        .id('{{ .Name }}')
        .sensu()

Let you decide if you want to change this default behaviour or at least document this in the sensu alert handler https://docs.influxdata.com/kapacitor/v1.3//nodes/alert_node/#sensu

If this can avoid someone else loose some hairs on it :)

Thanks.

lcaflc commented 7 years ago

To support this, please note that as you cannot change the .id() using the Chronograf UI to define alerts you basically can't define sensu alerts using it.

scrichar commented 5 years ago

hey folks, we are using sensu with kapacitor and running into this same bug. Kapacitor claims integration with sensu, but any "group by" clause built through chronograf UI will send illegal characters that sensu won't accept for alertName. Any chance you can strip the illegal characters for sensu integration? Or provide a another fix...

scrichar commented 5 years ago

hand editing the tick script after creating an alert through chronograf and changing the default tick script var idVar = name + ':{{.Group}}' to something static will mask alerts if more than one item in the group is alerting... Is there a way using the tick script to strip unwanted characters from the {{.Group}} ?? Any help is appreciated.

ghost commented 5 years ago

Seems like this isnt getting attention but I have just run into the same issue Kapacitor 1.5.2.

This is what I manually edited the line to be and it works for me.

var idVar = '{{ index .Tags "host"}}-CPU' (seems a "-" or a "." is okay but nothing else)

The only frustrating part to this is in the case of windows. Disk tags are affixed with a colon. Not sure how to get around that

ghost commented 5 years ago

Okay I think I managed to perform my own substatution, Hope this works for anyone else that runs into this issue. Notice what I did here is added a lambda function that does a replace of ':' with '-', which matches the needed regex. Then I make a new tag from that and use that in the name. Hope it helps someone else

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |eval(lambda: "used_percent", lambda: strReplace("device", ':', '-', -1))
        .as('value','disk_replace')
        .tags('disk_replace')
        .keep()

var trigger = data
    |alert()
        .crit(lambda: "value" > crit)
        .message(message)
        .id('{{ index .Tags "host"}}-Disk-{{ index .Tags "disk_replace" }}')
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .sensu()
        .source('Test-{{ index .Tags "host"}}')
        .handlers('default')