influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 493 forks source link

Deadman alert produces errors in derivative node #1984

Open Rhine25 opened 6 years ago

Rhine25 commented 6 years ago

I have created a script that triggers an alert when no point is received for 30s on the ping measurement using the deadman node. The issue, as shown doing a 'kapacitor show' on the tickscript, is that the derivative node contained in the deadman node produces errors :

var data = stream |from() .measurement('ping') .groupBy('url') |deadman(0.0, 30s) .message('Ping: {{ index .Tags "url" }} is {{ if eq .Level "OK" }}up{{ else }}down{{end}}') .details('{{ if eq .Level "OK" }}{{ index .Tags "url" }} is back online after at least {{ .Duration }} without responding to pings.{{ else }} No ping data received for ' + string(period) + ' from {{ index .Tags "url" }}{{ end }}') .log('/tmp/deadman.log') .stateChangesOnly() .topic('deadman')

DOT: digraph ping-deadman-adeo { graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ]; stream0 -> from1 [processed="5080"];

from1 [avg_exec_time_ns="17.127µs" errors="0" working_cardinality="0" ]; from1 -> noop3 [processed="5080"];

noop3 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];

stats2 [avg_exec_time_ns="80.236µs" errors="0" working_cardinality="0" ]; stats2 -> derivative4 [processed="1162"];

derivative4 [avg_exec_time_ns="32.133µs" errors="28" working_cardinality="7" ]; derivative4 -> alert5 [processed="1127"];

alert5 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="80.894µs" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="7" ]; }

Also, it appears that when grouping by host the alerts trigger fine, whereas when grouping by url like above, no alerts are triggered despite a machine no longer creating points

Rhine25 commented 6 years ago

In the kapacitor logs, the error is : lvl=error msg="cannot perform derivative" service=kapacitor task_master=main task=ping-deadman-adeo node=derivative4 err="elaspsed time was 0"