Only 1 event is fired after reload task forcing to change to OK status N active events

Hi,

We have been working with Kapacitor to generate alerts based on metrics threshold (simple ones) on:

SO: RHEL 7.4 Kapacitor: Kapacitor OSS 1.5.0 (git: HEAD 4f10efc41b4dcac070495cf95ba2c41cfcc2aa3a)

Overview

We have some TICKScripts that fires N events, based on working cardinality of the alert node, so the N events can be changing his own state based on the threshold.

The problem seems to appear when we change the TICKScript and we reload the task, forcing the OK of the N events

Actual behaviour

After reload the task with new thresholds to force the OK on the N events, only 1 event is fired to OK and the other N-1 events seems to be 'lost' and considered as OK, but no OK event is fired.

Expected behaviour

After reload the task with new thresholds to force the OK on the N events, the N events are fired to OK.

Detailed case

To allow you to repro the case, I have written a TICKScript and a brief table with actions and events fired:

TICKSCRIPT

var ID = 'ticks_cpu'
var FIELD = 'usage-idle'
var FIELD_DEFAULT = 0.0
var TH_CRIT_DEF = 0.0
var TH_WARN_DEF = 0.0
var TH_INFO_DEF = 0.0

// TICKSCRIPT:
// ================
// var data = stream
stream
    |from()
        .database('telegraf')
        .retentionPolicy('autogen')
        .measurement('cpu')
        .groupBy(*)
    |default()
        .field(FIELD, FIELD_DEFAULT)
    |eval()
        .keep(FIELD)
    |window()
        .period(1m)
        .every(10s)
        .align()
    |mean(FIELD)
        .as('value')
    |alert()
        .crit(lambda: float("value") < TH_CRIT_DEF)
        .warn(lambda: float("value") < TH_WARN_DEF)
        .info(lambda: float("value") < TH_INFO_DEF)
        .id(ID)
        .log('/tmp/test-cpu.log')

Actions and results

On the following table, it is shown the actions and the events results.

As it is shown, after forcing an OK on already N CRIT events, it only fires a single OK event

Step	Action	#Cores	#Actual Events	Expected result	Example
1	-Start cpu stress on host. TICKScript is not modified	2+1 (cpu-total)	3	`OK`	`CRIT: Series – cpu0/myhost CRIT: Series – cpu1/myhost CRIT: Series – cpu-total/myhost`
2	Stop cpu stress on host. TickScript is not modified	2+1 (cpu-total)	3	`OK`	`OK: Series - cpu0/myhost OK: Series – cpu1/myhost OK: Series – cpu-total/myhost`
3	Modify TICKScript, setup threshold to fire CRITS	2+1 (cpu-total)	3	`OK`	`CRIT: Series – cpu0/myhost CRIT: Series – cpu1/myhost CRIT: Series – cpu-total/myhost`
4	Modify TICKscript, setup threshold to fire OK after CRITS	2+1 (cpu-total)	1	`NOOK`	`OK: Series – (RANDOM?)/myhost`

influxdata / kapacitor