I have created a script that triggers an alert when no point is received for 30s on the ping measurement using the deadman node.
The issue, as shown doing a 'kapacitor show' on the tickscript, is that the derivative node contained in the deadman node produces errors :
var data = stream
|from()
.measurement('ping')
.groupBy('url')
|deadman(0.0, 30s)
.message('Ping: {{ index .Tags "url" }} is {{ if eq .Level "OK" }}up{{ else }}down{{end}}')
.details('{{ if eq .Level "OK" }}{{ index .Tags "url" }} is back online after at least {{ .Duration }} without responding to pings.{{ else }} No ping data received for ' + string(period) + ' from {{ index .Tags "url" }}{{ end }}')
.log('/tmp/deadman.log')
.stateChangesOnly()
.topic('deadman')
Also, it appears that when grouping by host the alerts trigger fine, whereas when grouping by url like above, no alerts are triggered despite a machine no longer creating points
In the kapacitor logs, the error is :
lvl=error msg="cannot perform derivative" service=kapacitor task_master=main task=ping-deadman-adeo node=derivative4 err="elaspsed time was 0"
I have created a script that triggers an alert when no point is received for 30s on the ping measurement using the deadman node. The issue, as shown doing a 'kapacitor show' on the tickscript, is that the derivative node contained in the deadman node produces errors :
var data = stream |from() .measurement('ping') .groupBy('url') |deadman(0.0, 30s) .message('Ping: {{ index .Tags "url" }} is {{ if eq .Level "OK" }}up{{ else }}down{{end}}') .details('{{ if eq .Level "OK" }}{{ index .Tags "url" }} is back online after at least {{ .Duration }} without responding to pings.{{ else }} No ping data received for ' + string(period) + ' from {{ index .Tags "url" }}{{ end }}') .log('/tmp/deadman.log') .stateChangesOnly() .topic('deadman')
DOT: digraph ping-deadman-adeo { graph [throughput="0.00 points/s"];
stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ]; stream0 -> from1 [processed="5080"];
from1 [avg_exec_time_ns="17.127µs" errors="0" working_cardinality="0" ]; from1 -> noop3 [processed="5080"];
noop3 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stats2 [avg_exec_time_ns="80.236µs" errors="0" working_cardinality="0" ]; stats2 -> derivative4 [processed="1162"];
derivative4 [avg_exec_time_ns="32.133µs" errors="28" working_cardinality="7" ]; derivative4 -> alert5 [processed="1127"];
alert5 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="80.894µs" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="7" ]; }
Also, it appears that when grouping by host the alerts trigger fine, whereas when grouping by url like above, no alerts are triggered despite a machine no longer creating points