inovex / mqtt_blackbox_exporter

Prometheus Exporter for MQTT monitoring
Apache License 2.0
77 stars 24 forks source link

Exporter does not survive an unreachable MQTT broker without being restarted #26

Closed frittentheke closed 5 years ago

frittentheke commented 5 years ago

While running for days and weeks the broker the _MQTT_blackboxexporter working flawlessly it stopped probing the target with only those lines in it's output / log:

journalctl --unit=mqtt_blackbox_exporter.service
-- Logs begin at Fr 2018-08-31 08:22:24 UTC, end at Do 2018-09-06 11:38:27 UTC. --
Aug 31 08:22:38 cvm01317 systemd[1]: Started MQTT Blackbox Exporter.
Aug 31 08:22:38 cvm01317 systemd[1]: Starting MQTT Blackbox Exporter...
Aug 31 08:22:38 cvm01317 mqtt_blackbox_exporter[864]: 08:22:38.419237 main.go:225: Starting mqtt_blackbox_exporter (build: 0.2.0-20170811-130735+b2b0651)
Sep 06 05:20:59 cvm01317 mqtt_blackbox_exporter[864]: 05:20:59.478407 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:21:30 cvm01317 mqtt_blackbox_exporter[864]: 05:21:30.057167 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:22:00 cvm01317 mqtt_blackbox_exporter[864]: 05:22:00.672315 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:22:32 cvm01317 mqtt_blackbox_exporter[864]: 05:22:32.669914 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:23:02 cvm01317 mqtt_blackbox_exporter[864]: 05:23:02.868430 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:23:33 cvm01317 mqtt_blackbox_exporter[864]: 05:23:33.161793 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:24:03 cvm01317 mqtt_blackbox_exporter[864]: 05:24:03.515896 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:24:34 cvm01317 mqtt_blackbox_exporter[864]: 05:24:34.306594 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:25:04 cvm01317 mqtt_blackbox_exporter[864]: 05:25:04.567119 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:25:35 cvm01317 mqtt_blackbox_exporter[864]: 05:25:35.482858 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:26:05 cvm01317 mqtt_blackbox_exporter[864]: 05:26:05.837598 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:26:36 cvm01317 mqtt_blackbox_exporter[864]: 05:26:36.276451 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:27:07 cvm01317 mqtt_blackbox_exporter[864]: 05:27:07.408797 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:27:38 cvm01317 mqtt_blackbox_exporter[864]: 05:27:38.204040 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:28:09 cvm01317 mqtt_blackbox_exporter[864]: 05:28:09.468270 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:28:40 cvm01317 mqtt_blackbox_exporter[864]: 05:28:40.608349 main.go:153: Network Error : %!s(<nil>)
Sep 06 05:29:10 cvm01317 mqtt_blackbox_exporter[864]: 05:29:10.846582 main.go:153: Network Error : %!s(<nil>)

it was still available and happily providing the /metrics endpoint but those were not updated anymore. The _probe_mqtt_completedtotal and _probe_mqtt_startedtotal remain static, but interestingly the__probe_mqtt_startedtotal metric was +1 of the value _probe_mqtt_completedtotal had.

I believe the probing simply got stuck somehow, not applying some sort of timeout and recovering / retrying again for the next iteration / probing interval.

hikhvar commented 5 years ago

When did you restart the exporter? At 05:30:00?

hikhvar commented 5 years ago

what sort of "unreachable" was the broker? Was the broker restarted?

frittentheke commented 5 years ago

@hikhvar I am running both your PRs #27 and also the newer vendored in dependencies from #29 and this seems to have fixed the issues.