Closed jeremyj closed 4 years ago
OK, since I had been experimenting with various notification filters I reset my backend (rm -rf /var/lib/sensu/sensu-backend/etcd/
) and re-imported the above configurations.
Now I am getting a notification every 120s and am very confused.
Can anyone point me in the right direction?
Hi @jeremyj I tried reproducing this issue, but I am seeing notifications happen as expected. Here is what I tried.
I created a debug handler, this simply drops event JSON into a file for any notifications that make it past the filter(s).
type: Handler
api_version: core/v2
metadata:
name: debug
namespace: default
spec:
command: jq . >> /tmp/events.out
env_vars: null
filters:
- is_incident
- not_silenced
- fatigue_check
handlers: null
runtime_assets: null
timeout: 0
type: pipe
I assigned this handler to a check that I could easily inject failures into:
type: CheckConfig
api_version: core/v2
metadata:
name: http
namespace: default
annotations:
fatigue_check/occurrences: "1"
fatigue_check/interval: "60"
fatigue_check/allow_resolution: "true"
spec:
check_hooks: null
command: check-http.rb -u http://agent -q 'Welcome to CentOS'
env_vars: null
handlers:
- debug
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
output_metric_format: nagios_perfdata
output_metric_handlers: null
proxy_entity_name: ""
publish: true
round_robin: false
runtime_assets:
- sensu/sensu-ruby-runtime
- sensu-plugins/sensu-plugins-http
stdin: false
subdue: null
subscriptions:
- linux
timeout: 10
ttl: 0
I then injected my failure and based on the above configuration I should see events in my /tmp/events.json file that align with the first occurrence, and each occurrence at 60 second intervals, and finally a resolution event. And that is what I saw when running the following on the debug output.
$ jq '{timestamp: .timestamp, state: .check.state, interval: .check.interval, watermark: .check.occurrences_watermark}' /tmp/events.out
{
"timestamp": 1578951985,
"state": "failing",
"interval": 10,
"watermark": 1
}
{
"timestamp": 1578952035,
"state": "failing",
"interval": 10,
"watermark": 6
}
{
"timestamp": 1578952095,
"state": "failing",
"interval": 10,
"watermark": 12
}
{
"timestamp": 1578952155,
"state": "failing",
"interval": 10,
"watermark": 18
}
{
"timestamp": 1578952165,
"state": "passing",
"interval": 10,
"watermark": 18
}
Can you try a similar debug handler configuration for your check?
Hi @nixwiz
Thanks for your answer.
Turns out I was performing this check against 2 entities and the check's round_robin
value was set to true
, so for each entity I was notified once every 2 times the check was performed.
Hello,
sensu-go-agent 5.16.1-8521 sensu-go-backend 5.16.1-8521 fatigue-check-filter //assets.bonsai.sensu.io/.../sensu-go-fatigue-check-filter_0.3.2.tar.gz
I have a check configured as so:
This is the
slack
handler:The filter:
The entity:
so I should be getting an alert every 60 minutes, but instead I'm getting a notification every 120 minutes.
What else should I check to find out where the problem is? Thanks