Closed brackw closed 1 year ago
Interesting. I have five units, all on the same firmware. Two of them will not show up in HA, even after rebooting the units. I wonder if this would fix that? The web UI on both works fine.
@sean-leach
Rebooting the unit will reset lastmqttretry and millis back down to 0 which makes the original logic valid again, so if those units are not reconnecting after a reboot it shouldn't be this particular bug unless there is more to it.
Off topic for this ticket, but out of curiosity, on the unconnected units status page, do you also get MQTT status of -4 or something else? That may provide more clues for your particular issue.
-4 : MQTT_CONNECTION_TIMEOUT - the server didn't respond within the keepalive time
-3 : MQTT_CONNECTION_LOST - the network connection was broken
-2 : MQTT_CONNECT_FAILED - the network connection failed
-1 : MQTT_DISCONNECTED - the client is disconnected cleanly
0 : MQTT_CONNECTED - the client is connected
1 : MQTT_CONNECT_BAD_PROTOCOL - the server doesn't support the requested version of MQTT
2 : MQTT_CONNECT_BAD_CLIENT_ID - the server rejected the client identifier
3 : MQTT_CONNECT_UNAVAILABLE - the server was unable to accept the connection
4 : MQTT_CONNECT_BAD_CREDENTIALS - the username/password were rejected
5 : MQTT_CONNECT_UNAUTHORIZED - the client was not authorized to connect
@brackw Nope, signing into that device directly, the Status page shows: "MQTT Status ==> CONNECTED ( 0 )"
It's so strange.
tl;dr at the bottom, but I believe I determined the cause of the issue while writing this ticket, the fix is simple, but if anyone thinks there might be another issue at play, or that I'm simply wrong please chime in.
Occasionally the code will not reconnect to MQTT broker after broker was taken off line.
My MQTT broker was offline for a bit of time to facilitate a homeassistant update, after the broker was brought back online, 2 out of 4 units (all running the same firmware) failed to ever reconnect to the broker with the listed reason of:
-4 : MQTT_CONNECTION_TIMEOUT
The 2 units that failed to reconnect still operated as expected with web control, rebooting them was the only way to get them connected again.
Per the code, it should retry every 10ms 5 times every MQTT_RETRY_INTERVAL_MS, however I think that the two units that have failed to recover have been online for a longer period of time than the two that did recover.
I believe it's likely that millis() overflowed and lastMqttRetry is a higher value than current millis(), so this statement will only be true again after millis() has a chance to overtake the last retry time, so it's simply not trying to reconnect.
tl;dr
(millis() > (lastMqttRetry + MQTT_RETRY_INTERVAL_MS)
on line 1797 needs to change to(millis() - lastMqttRetry > MQTT_RETRY_INTERVAL_MS)
for the MQTT reconnect code to function properly if a unit has been online for more than 50 days and millis() rolls back over to 0