peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
549 stars 116 forks source link

Sporadic resets - not entirely clear why #118

Open gmrza opened 11 months ago

gmrza commented 11 months ago

I am using mqtt_as on a Raspberry Pi Pico W to report on a BME280 (temperature, pressure and humidity) sensor and a flow meter, which uses a Hall Effect sensor. I originally wrote a micropython script which directly wrote to an influxdb db, but I am busy migrating to using MQTT (using Mosquitto as the broker) and mqtt_as as the client. In my test setup, I am running into issues that the controller is sporadically resetting. I am not seeing any exceptions generated before the reset, and when I interrogate machine.reset_cause() during restart, it always gives a value of 1 (machine.PWRON_RESET). Just to check that the value was being changed, I caused a watchdog timer reset before starting my script (setting the reset_cause() to 3 before my script started). The Hall Effect sensor is being read using a hard IRQ, and my IRQ handler is as simple as I can make it: `def irq_handler(pin): global pulses, maxpulses

pulses = ( pulses + 1 ) % maxpulses`

"maxpulses" is currently set up so that my "pulses" counter acts as a 24 bit circular counter.

I do have the emergency exception buffer set up as well. At this point I can't specifically tell what is causing the resets, but I have only seen this behaviour since using mqtt_as.

I'd be open to suggestions on how to diagnose. I have also tried using a soft IRQ, but I have also seen the resets occur at times when there are no IRQs occurring - I use 2 test setups, where I use a second Pi Pico to generate pulses rather than the Hall Effect sensor - one where I pulse the input continuously at up to 200Hz, and one where I pulse for a while and then have a gap. In the second test I seem to even be getting resets when there are no IRQs. I will eliminate the I2C access as a further test to see if that is having an effect.

peterhinch commented 11 months ago

How is the Pico W being powered?

gmrza commented 11 months ago

Either via a micro USB or 5V to VSYS through a Schottky diode. The problem with powering via USB from another computer is that when the system sleeps and goes into power-save power on the USB drops, so I prefer to feed power via VSYS.

peterhinch commented 11 months ago

The reason I ask is that many USB "wall warts" are basically crap. They are designed for recharging devices with a LiIon battery rather than for continuously powering an electronic device. The latter is a much more critical application. For long term tests the official adaptors for the Raspberry Pi are excellent.

The reset cause does hint at a power glitch. The other aspect is that the ESP32 can take substantial pulse of power when using the WiFi radio. This can cause problems with some adaptors. We (myself and @kevinkk525) did a great deal of long term testing of mqtt_as including runs lasting months. We found that the quality of power supply was crucial.

gmrza commented 11 months ago

The reason I ask is that many USB "wall warts" are basically crap. They are designed for recharging devices with a LiIon battery rather than for continuously powering an electronic device. The latter is a much more critical application. For long term tests the official adaptors for the Raspberry Pi are excellent.

The reset cause does hint at a power glitch. The other aspect is that the ESP32 can take substantial pulse of power when using the WiFi radio. This can cause problems with some adaptors. We (myself and @kevinkk525) did a great deal of long term testing of mqtt_as including runs lasting months. We found that the quality of power supply was crucial.

That makes a lot of sense. I've got an official RPI adaptor, so I'm giving that a go to see whether that makes a difference. In "production" I'm using a totally different power supply, using a buck converter off a rectified 24VAC supply - that environment has seen uptimes in excess of 40 days. Hopefully it is just a power glitch.

gmrza commented 11 months ago

The RPI power supply doesn't seem to be helping. I think as a next step I need to go back to my legacy code base that doesn't use MQTT and confirm that I don't have a hardware issue.

peterhinch commented 11 months ago

As general background mqtt_as uses standard Python code. I can't see any reason why IRQ's would affect it. I haven't done a long running test on the Pico W however I would be very surprised if it had problems. I've used Pico hardware quite heavily and never had a hint of trouble.

I would offer to set up a long term test of a Pico W running one of the demos (range_ex.py), but I'm testing another project which involves a lot of fiddling with the WiFi setup - it wouldn't be a fair test. You might want to try this yourself to build up confidence in the platform (or identify any issues).

gmrza commented 11 months ago

As a next step, I've reverted to my legacy code, and I'm running a test on that again. If I have the same issue, then it is probably a hardware issue. If that doesn't cause any problems, then as a next test I'll run one of the demos.

beetlegigg commented 11 months ago

Just a bit of info, for whats it worth, as I've been using mqtt_as long term on 3 rpi picoW boards. They've been running rock solid for at least 6 months. (and one for 10 months)

One is attached to 2 sensors for outside/ inside temp and humidity readings from an outbuilding transmitted every 15 minutes by mqtt. Another picoW receives these mqtt messages and displays the data on a small screen.

The third picoW receives mqtt messages every minute appertaining to various house / surroundings sensor data (collated on a rpi) and displays the data on the biggest screen that Peters nano gui can accommodate and this necessitated freezing the mqtt_as and nano gui code to get it to fit.

Anyway this is just to indicate that for me at least the use of mqtt_as on the picoW has proved completely reliable. They are all powered via some quality wall warts that have usb ports into which one plugs in ones own cable.

ebolisa commented 11 months ago

“… picoW has proved completely reliable.” I second to that!

peterhinch commented 11 months ago

@beetlegigg Thanks for that. Saved me a job! :+1:

gmrza commented 11 months ago

At the moment, it is getting closer to looking like one of two issues. One possibility is still that it is power-related. The other is potentially related to the fact that I am checking if power is being delivered via USB - mainly that is useful for decisions on whether to do debug output. The problem with that is that it accesses GPIO2 on the WiFi interface, and the datasheet notes that you should only check it when there is no WiFi transmission in progress. With my old code, where everything ran in one loop, that was easy to control. When using asyncio, the programming paradigm is different, and it is harder to control. Easiest is to live without those checks.

peterhinch commented 11 months ago

I don't think it could be done without modifying mqtt_as.py because transmissions occur regardless of the operation of the application - namely MQTT Ping packets. There is also background activity in running the qos==1 handshake. You'd need a Lock object to pause output to the socket while the GPIO2 test was in progress.

An option might be to unconditionally direct debug output to a UART on the basis that a UART neither knows or cares if anyone is listening.

gmrza commented 11 months ago

I don't think it could be done without modifying mqtt_as.py because transmissions occur regardless of the operation of the application - namely MQTT Ping packets. There is also background activity in running the qos==1 handshake. You'd need a Lock object to pause output to the socket while the GPIO2 test was in progress.

An option might be to unconditionally direct debug output to a UART on the basis that a UART neither knows or cares if anyone is listening.

A better solution is probably to check before I bring up the network if the USB is connected, and remember that. It is unlikely that I would be connecting and disconnecting the USB while things are running. - In other words - check if USB power is connected, and if so, turn on debug, based on the assumption that in production power is provided via VSYS only.

gmrza commented 10 months ago

I'm starting to think that my suspicions about interaction with the WiFi interface may be correct. The code that doesn't access WL_GPIO2 while WiFi is up seems to be stable. I wonder if the fact that mqtt_as does talk on the network a lot more than my older solution did potentially triggers some issues with the stability of the interface to the WiFi adaptor on the Pi Pico W. There is no real need to touch WL_GPIO2 after startup, so there is not really any impact of changing my code in that way.