Closed stefanhorning closed 5 years ago
I also wanna know if these "0s" are problems or not. After creating alert rules in Chronograf they show up here and there in the TICKscripts.
@russorat As you are usually quite quick in responding to tickets, I believe this one fell through the cracks. Could you please leave a comment here to see how to further deal with this issue, as it currently blocks us from adding email alerts. Thanks!
@stefanhorning sorry for missing this. i can't seem to recreate your issue. could you describe the steps to reproduce?
Thanks for your quick reply.
Ok, so it took me a bit myself this time. It seems to be a combination of various things is leading to this.
First my preconditions / environment:
To reproduce go to Alerts/Tasks page and create new alert rule (editing existing should also work), using the Alert rule builder:
Open the same rule in the TICK editor and you should now find the 0s
line in there, which will cause the alerts to go crazy (once the alert treshold is crossed).
For easier debugging, here the entire TICK script I created today (through the GUI) following above steps:
var db = 'telegraf'
var rp = 'metrics'
var measurement = 'disk'
var groupBy = ['host_role']
var whereFilter = lambda: TRUE
var period = 1m
0s
var every = 30s
var name = 'Test bug'
var idVar = name + ':{{.Group}}'
var message = 'Test alerting bug.'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'chronograf'
var outputRP = 'autogen'
var outputMeasurement = 'alerts'
var triggerType = 'threshold'
var crit = 90
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|window()
.period(period)
.every(every)
.align()
|min('used_percent')
.as('value')
var trigger = data
|alert()
.crit(lambda: "value" > crit)
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.email()
.to('foo@bar.com')
.slack()
.channel('#operations')
trigger
|eval(lambda: float("value"))
.as('value')
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag('alertName', name)
.tag('triggerType', triggerType)
trigger
|httpOut('output')
Hope this helps!
@stefanhorning i think the alert going crazy might be more related to the fact that we are not adding a "stateChangesOnly" option to the alert trigger. We've fixed an issue related to that before but i wonder if it has been introduced.
the 0s should also be on the line above which is also strange, but shouldn't be detrimental to the script execution IMO, although i haven't verified.
Ok, I will try out if adding stateChangesOnly fixes the issue. Will get back to you if I have some results.
Yes, you are right, I compared with alert rules with only one alert handler and they all have the .stateChangesOnly()
method right before the handler. When adding it to the rule with the two handlers manually the issue with too many alerts seems to be resolved.
So I guess we can close this tickets and derive two new ones from it for the Chronograf TICK generating logic:
0s
after a newline (less critical)stateChangesOnly()
always before the first alert handler (missing when creating rule with multiple handlers)So feel free to close this issue if those issues have been addressed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Feel free to reopen if this issue is still important to you. Thank you for your contributions.
When adding an email handler to an alert rule using Chronograf the resulting TICK script is buggy and causes the alert message to be sent out every second or so.
Upon closer looks I noticed the line
0s
in the tick script which is only being added when an email handler is added through the Chronograf GUISo with more context the begining of the TICK script looks somewhat like this
Somehow I didn't manage to repair the TICK script by just removing the
0s
line as kapacitor would just continue to spit out alerts (at least to the slack channel we also had as a second handler). Even disabling/enabling the rule and restarting kapacitor didn't help. But when creating a fresh rule without an email handler everything seems fine again.Let me know if I rather should report this problem to the Kapacitor project, but to me it looks like the bug is in the way the TICK script is generated by Chronograf.