prometheus / blackbox_exporter

Blackbox prober exporter
https://prometheus.io
Apache License 2.0
4.63k stars 1.05k forks source link

High system load after Blackbox Exporter update to 0.19.0 #793

Open karlism opened 3 years ago

karlism commented 3 years ago

Host operating system: output of uname -a

Linux hostname 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Apr 8 19:01:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter --version

blackbox_exporter, version 0.19.0 (branch: HEAD, revision: 5d575b88eb12c65720862e8ad2c5890ba33d1ed0)
  build user:       root@2b0258d5a55a
  build date:       20210510-12:56:44
  go version:       go1.16.4
  platform:         linux/amd64
$ ps aux | grep blackbox
prometh+  901327  2.6  1.2 717592 22868 ?        Ssl  08:11   1:14 /usr/bin/blackbox_exporter --config.file=/etc/prometheus/blackbox.yml --web.config.file=/etc/prometheus/blackbox_tls.yml --log.level=debug

What is the blackbox.yml module config.

I don't think this is relevant as in this case Blackbox Exporter is idle.

What is the prometheus.yml scrape config.

- job_name: blackbox_exporter
  honor_timestamps: true
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  follow_redirects: true
  relabel_configs:
  - source_labels: [__address__]
    separator: ;
    regex: ^(.*):[0-9]{2,5}$
    target_label: hostname
    replacement: $1
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: ^(.*):[0-9]{2,5}$
    target_label: source
    replacement: $1
    action: replace
  - source_labels: [source]
    separator: ;
    regex: (.*)
    target_label: __address__
    replacement: ${1}.example.com:9115
    action: replace
  static_configs:
  - targets:
    - hostname1a:9115
    - hostname1b:9115

What logging output did you get from adding &debug=true to the probe URL?

Debug logging is enabled, but logs are empty since the Blackbox Exporter has been started apart from standard startup messages.

What did you do that produced an error?

Upgrade to 0.19.0. Our setup consists of two Blackbox Exporter instances in each location. They have virtual IP address managed by keepalived and backup nodes normally do not serve any requests (apart from serving /metrics). For some reason, after an update I see that system load (node_load1) increases to 2 each 2 hours for about 15 minutes and then drops back to 0 on both active and backup Blackbox Exporter nodes.

What did you expect to see?

I would expect system load to be low, it was around 0 before an upgrade from Blackbox Exporter 0.18.0 to BlackBox Exporter 0.19.0

What did you see instead?

System load increases significantly roughly every 2 hours for about 15 minutes and then drops to 0. Stopping Blackbox Exporter also drops system load to 0.

karlism commented 3 years ago

This graph shows system load on backup Blackbox Exporter node:

image

We see same behavior in all sites, it's not limited to this node only.