prometheus / blackbox_exporter

Blackbox prober exporter
https://prometheus.io
Apache License 2.0
4.44k stars 1.03k forks source link

Inconsistent alerts triggered by the Prometheus alert manager #1225

Closed varampati6 closed 3 months ago

varampati6 commented 3 months ago

Host operating system: output of uname -a

blackbox_exporter version: output of blackbox_exporter --version

blackbox_exporter, version 0.20.0 (branch: HEAD, revision: 91372eba6cdef09f6d8453752cf47011bf32cb7a) build user: root@d6d8976bddf4 build date: 20220316-17:42:45 go version: go1.17.8 platform: linux/amd64

What is the blackbox.yml module config.

modules: https_2xx: prober: http timeout: 5s http: valid_http_versions: ["HTTP/1.0","HTTP/1.1", "HTTP/2.0"] method: GET preferred_ip_protocol: "ip4" valid_status_codes: [200,403,404,502] # An empty list defaults to 2xx fail_if_ssl: false fail_if_not_ssl: true tls_config: insecure_skip_verify: true

What is the prometheus.yml scrape config.

global: scrape_interval: 60s
evaluation_interval: 10m scrape_configs:

What logging output did you get from adding &debug=true to the probe URL?

What did you do that produced an error?

The configuration as mentioned above

What did you expect to see?

"The Prometheus query probe_success{job="blackbox"} == 0 should return the applications that are in a down state for a 10-minute interval, as specified in the configuration mentioned above. In my case, the Jira application mentioned above is down."

What did you see instead?

"Instead of getting an alert from one result, probe_success{job="blackbox"} == 0 is returning inconsistent outputs. For instance, Jira application may go down for a couple of minutes, and sometimes the Jenkins application is down even though the Jenkins application is up when running the probe_success query"