Closed hikhvar closed 5 years ago
@hikhvar thanks for looking into the issue I reported! I've been testing the blackbox exporter with your pull request included. It's been running for the last two days in three instances without issues.
One of the instances ran into a similar issue as reported before now though. The metrics stopped updating an _probe_mqtt_startedtotal was again +1 of the value _probe_mqtt_completedtotal see:
probe_mqtt_completed_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 4329 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.005"} 0 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.01"} 0 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.025"} 0 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.05"} 0 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.1"} 3687 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.25"} 4193 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="0.5"} 4258 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="1"} 4274 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="2.5"} 4325 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="5"} 4329 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="10"} 4329 probe_mqtt_duration_seconds_bucket{broker="ssl://mqtt.example.com:1883",name="broker via LAN",le="+Inf"} 4329 probe_mqtt_duration_seconds_sum{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 514.0778760399995 probe_mqtt_duration_seconds_count{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 4329 probe_mqtt_errors_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 0 probe_mqtt_messages_published_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 43300 probe_mqtt_messages_received_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 43300 probe_mqtt_started_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 4330 probe_mqtt_timeouts_total{broker="ssl://mqtt.example.com:1883",name="broker via LAN"} 0
This time there was absolutely no log output after the exporter reported successful startup.
Maybe adding a few lines of debug logging code could help to finally narrow down the last points where things could get locked up or lost waiting for things forever?
Hey @frittentheke, I added the option -trace.enable
. This method should show up all actions taken by the mqtt library. Moreover, the blackbox exporter will now log all mqtt library errors and criticals by default.
@dirkcjelli @theinrichs FYI ^
@hikhvar Thanks a bunch. I rolled this new version out just now. As soon as it locks up again we should then have some more indication on where this happens in the code.
@arnoldbechtoldt PTAL . According to @frittentheke this code runs smoothly.
@hikhvar would you be so kind and cherry-pick the dep commits into a dedicated PR? So we can focus on the initial issue in this PR? :)
According to @frittentheke this code runs smoothly.
Is this personal feedback you've received? Does it mean that the initial issue seems to be fixed?
If we cherry-pick the dep commits, this PR will not build in the CI. Yes, this is personal feedback I recieved.
Devided this PR into Dep related changes #29 and the timeout related changes #30
Shoud fix #26