steadramon / ESPGeiger

☢️ ESP8266 / ESP32 Firmware for collecting and reporting statistics from Geiger counters
GNU General Public License v3.0
28 stars 3 forks source link

Loss of connection to MQTT broker causes lockups and reboots #11

Closed Mr-Blinky closed 2 months ago

Mr-Blinky commented 4 months ago

When the ESP8266 looses connection to the MQTT broker it causes the MCU to lockup/lag and then reboot after what appears to be an arbitrary length of time. With this the status web page goes very slow (see timestamps on the chart of the attached image.) Similar can be seen with the OLED and red LED on the ESPGeiger HW. Every so often the OLED will update and the red LED will light for a longer time than the regular count blink. The log window on the status web page seems unaffected.

This affects v0.5.3 and v0.5.4. Attached is a log text file copied from the status page log window and a screen grab of the status web page. v0.5.3 was in use at first, then updated to v0.5.4.

Hardware/software: ESPGeiger HW (v0.5.3/064bca6 & v0.5.4/3066e12 Mosquitto MQTT Broker, running on Windows 10 Pro

Steps to reproduce:

  1. Setup/configure an MQTT broker (Mosquitto in my case)
  2. Configure ESPGeiger to use the MQTT server
  3. Observe normal operation
  4. Break the connection to the MQTT broker (in my case I stopped the Mosqitto service on Windows)
  5. Observe the ESPGeiger slow down/lockup and after some time (around 10-15 minutes in my tests) the unit will reboot

If the MQTT broker connection is restored normal operation resumes, however if the connection is not restored the MCU will reboot. The first reboot occurred around 17 minutes after the connection was lost. After restoring the connection normal operation resumed. I then updated to v0.5.4. Observed normal operation and then broke the MQTT connection again. The reboot occurred after 13 minutes and then again 12 minutes later. This can be seen in the log file attached.

Please let me know if you would like any further information or testing.

Thanks.

ESPGeiger-4e4aee.txt ESPGeiger-4e4aee - MQTT issue

steadramon commented 4 months ago

Hi - thanks for the report.

I've been thinking over a few solutions to this, ultimately it is because the MQTT library is use uses a blocking methods during the reconnection.

The reboot of the device happens due to coding within ESPGeiger MQTT_Client code. I think I will initially re-code this to have the retry back off so that it's not agressivly retrying all the time.

This is something I'm aware of re: the blocking library. There are Async clients based upon the AsyncTCP libraries which are already in use in ESPGeiger for the submissions to third parties.

I'll implement a backoff for the retry mechanism and then start inventigation into Async libraries for MQTT.

Thanks!

steadramon commented 4 months ago

Good progress has been made with the asynchronous MQTT solution 😄 This is currently testing - but available in the main repository and seems fine from everything I've checked so far - want to check for longevity of the connection before calling this closed.

steadramon commented 3 months ago

Release v0.6.1 has reimplemented MQTT using the async MQTT client, this should cover the problems you were seeing when MQTT service is offline.

Mr-Blinky commented 2 months ago

I have updated to 0.6.1, tested by stopping Mosquitto and the ESPGeiger carried on working with no issues. When Mosquitto was restarted ESPGeiger reconnected properly to MQTT and carried on working properly. I can confirm the issue has been fixed.

Nice work in implementing the async solution. Cheers!