Open zargex opened 7 years ago
@zargex Can you share the first part of the TICKscript as well, the part before the deadman?
stream
|from()
.measurement('procstat')
.where(lambda: "pidfile" == '/tmp/svc-sendgrid-subscriber.pid')
.where(lambda: "host" == 'localhost')
|eval(lambda: float("memory_rss") / (1024.0*1024.0))
.as('memory_rss')
|alert()
.id('sendgrid-subscriber MEM USAGE Alert')
.message('The sendgrid-subscriber mem usage is {{.Level}} on host: {{ index .Tags "host" }}, the sendgrid-subscriber is using this amount of ram : {{ index .Fields "memory_rss" }} MB')
.crit(lambda: "memory_rss" > 512)
.warn(lambda: "memory_rss" > 256)
.info(lambda: "memory_rss" > 100)
.slack()
.channel('#alerts-staging')
.stateChangesOnly(10m)
I put the deadman after the from
section and before the eval
section.
@zargex Could you please try below script?
stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('procstat')
.where(lambda: "pidfile" == '/tmp/svc-sendgrid-subscriber.pid')
.where(lambda: "host" == 'localhost')
|deadman(1.0, 10s)
.slack()
.channel('#alerts-staging')
|eval(lambda: float("memory_rss") / (1024.0*1024.0))
.as('memory_rss')
|alert()
.id('sendgrid-subscriber MEM USAGE Alert')
.message('The sendgrid-subscriber mem usage is {{.Level}} on host: {{ index .Tags "host" }}, the sendgrid-subscriber is using this amount of ram : {{ index .Fields "memory_rss" }} MB')
.crit(lambda: "memory_rss" > 512)
.warn(lambda: "memory_rss" > 256)
.info(lambda: "memory_rss" > 100)
.slack()
.channel('#alerts-staging')
.stateChangesOnly(10m)
@adityacs I tried what you proposed, but I only got notification saying the alert is dead. What I undertand is that my throughput is very low, thus the deadman switch is triggered.
If I use the kapacitor's show command in that alert, I get:
graph [throughput="0.00 points/s"];
but sometimes I get
graph [throughput="18.00 points/s"];
Telegraf is using a default interval of 10 seconds for all plugins. Maybe if I reduce this interval Kapacitor will work as I expected
The aforementioned proposition didn't work with Kapacitor 1.5. But this works :
var data =
stream
|from()
.measurement('cpu')
.groupBy(*)
data
|alert()
.crit(lambda: "usage_idle" < 10)
.topic('cpu')
data
|deadman(threshold, interval)
From https://stackoverflow.com/questions/45556226/how-to-add-a-deadmans-switch-to-an-existing-alert
Hi, I'm using telegraf to supervise some process. So I thought it could be interesting if kapacitor is able to notify me when any process die.
I think this can be done with Kapacitor's deadman switch (if a process die, telegraf's procstat input can't send data to influxdb). But i'm having troubles with this, if I kill the process, Kapacitor will send the alert, but when I restart the process, Kapacitor still sending the alert.
I'm trying something like this:
If no point has been arrived in 10s, send the alert ( I think it works like that).
But executing kapacitor show in the task, I can see graph [throughput="0.00 points/s"]; I don't know if this thoughput matters.
I'm using kapacitor 1.3.1-1 on Debian 8.
Thanks.