Open reefland opened 2 years ago
I see in the code it tries to "connect forever" to the Broker, but I still get left with this in the logs:
2022/08/17 16:19:54 Error: Connection to tcp://mosquitto-mqtt.mosquitto:1883 lost: EOF
The broker is up and running, can connect fine with a client. Just restart the mosquitto-exporter
container manually and it is able to connect again. Due to this forever loop I don't see a way to automate this restart when it is unable to connect since it does not error out and trigger a container restart policy.
I don't see a way to monitor that its unable to connect as it still publishes stale metrics even though it is unable to connect to broker. Seems the metrics it publishes are just stuck in time. These should drop to zero or become unavailable after some point to allow alerting.
The only thing I could think of was to detect the rate of change on messages published is stuck, then generate an alert:
- alert: MosquittoPublishedMessagedAtZeroError
annotations:
description: Mosquitto MQTT published message rate is at zero for more than 1 minute.
summary: Mosquitto MQTT published message rate is at zero for more than 1 minute.
expr: rate(broker_publish_messages_sent[1m]) == 0
for: 1m
labels:
issue: Mosquitto MQTT published message rate is at zero for more than 1 minute.
severity: critical
I at least have an alert now, when mosquitto-exporter is not updating metrics, when I check its logs, its not connected, but I can't automate a solution to restart it. Zigbee2MQTT, HomeAssistant, Frigate, etc.... all connecting fine and maintaining connection. Just this exporter having a problem that I can tell.
Came here to check the same issue 😂 I wonder if I could somehow trigger a remediation based on an alert to restart the pod
Could it be that the instantiation of the client needs to be repeated (moved into the for loop) after connection fails? client := mqtt.NewClient(opts)
Is there a way to configure connection retries? I had to bounce the broker and the mosquitto-exporter log ended with:
No sign of a retry, the program doesn't exit out to trigger a container restart policy.
I manually restarted mosquitto-exporter and connected fine.
If the exporter can't connect, it should retry a set number of times and then exit out to allow the restart policy to kick in. Then it becomes a condition that can be monitored and fixed.