pimoroni / enviro

MIT License
101 stars 79 forks source link

v0.0.8 hard crashes in Thonny #78

Closed helgibbons closed 1 year ago

helgibbons commented 1 year ago

When running main.py through Thonny to debug I'm now encountering frequent hard crashes, requiring a board reset or unplug/replug of the USB cable. Here's an example from just now - on this occasion it keeled over just after waking up, but I've seen it crash at different points in the program.

I'm seeing this on multiple boards, USB cables, PCs, Thonny versions, endpoints.

main.py output:

> >>> %Run -c $EDITOR_CONTENT
       ___            ___            ___          ___          ___            ___       
      /  /\          /__/\          /__/\        /  /\        /  /\          /  /\      
     /  /:/_         \  \:\         \  \:\      /  /:/       /  /::\        /  /::\     
    /  /:/ /\         \  \:\         \  \:\    /  /:/       /  /:/\:\      /  /:/\:\    
   /  /:/ /:/_    _____\__\:\    ___  \  \:\  /__/::\      /  /:/~/:/     /  /:/  \:\   
  /__/:/ /:/ /\  /__/::::::::\  /___\  \__\:\ \__\/\:\__  /__/:/ /:/___  /__/:/ \__\:\  
  \  \:\/:/ /:/  \  \:\~~~__\/  \  \:\ |  |:|    \  \:\/\ \  \:\/:::::/  \  \:\ /  /:/  
   \  \::/ /:/    \  \:\         \  \:\|  |:|     \__\::/  \  \::/~~~`    \  \:\  /:/   
    \  \:\/:/      \  \:\         \  \:\__|:|     /  /:/    \  \:\         \  \:\/:/    
     \  \::/        \  \:\         \  \::::/     /__/:/      \  \:\         \  \::/     
      \__\/          \__\/          `~~~~~`      \__\/        \__\/          \__\/      

    -  --  ---- -----=--==--===  hey enviro, let's go!  ===--==--=----- ----  --  -     

2022-09-07 10:51:47 [debug    / 122kB] > performing startup
2022-09-07 10:51:47 [info     / 121kB]   - wake reason: external_trigger

and from log.txt:

> 2022-09-07 10:51:47 [debug    / 122kB] > performing startup
peter-mount commented 1 year ago

I have had that with v0.0.7 on an urban board. Every so often it would crash and I would have to power it down for 5 minutes before it would reboot.

I put that down to powering it via the USB as that caused the board not to fully go to sleep. So yesterday I had it running with a 5V supply going direct to the battery connector and not the USB. It ran fine until this afternoon when I moved it back to it's outdoor location but with the 5V supply to it and it got stuck again on boot.

Only got it back up and running again about 30 minutes ago when I managed to force reset it.

I'm going to leave it running like this for a while to see if it has the same issue again & if it does get the logs to see if it's still hanging at the same step in the startup.

tbbuck commented 1 year ago

I've had a couple of crashes early in the morning. My gut feel was that it's hanging on an indefinite timeout while POSTing readings to my HTTP endpoint - it coincides with a time when the server it's using is busy running backups and maintenance scripts.

I've added timeout=30 to the HTTP destination code, will see if that magically fixes things. I worry that it might hanging while connecting, which AFAICT would not be affected by setting the timeout here. No idea how to address that without writing a custom HTTP request library!

peter-mount commented 1 year ago

Yesterday afternoon my Urban stopped responding and even after multiple power off/on cycles just would not come back - except I could see it connecting to WiFi due to the DHCP requests.

This morning I finally got out with the step ladder to check it physically and the red led was on. Reset would not work until several power cycles and now it's back.

So I could have a similar issue - except I use MQTT, it's taking the sensor readings, connecting to WiFi, then fails at connecting to MQTT which causes it to hang.

I need to try to get at the logs - problem with it attached to the wall outside to get a USB connection to it.

tbbuck commented 1 year ago

Great info! Looks like MQTT has a nice default 30 second timeout set on the socket, I could be barking up the wrong tree:

https://github.com/pimoroni/enviro/blob/35a9036a9c56e359052b76f9dc059277d4289056/enviro/mqttsimple.py#L67

Wonder if it's a WiFi thing? I can't see anything obvious, though I'm by no means a Python expert. Guess running out of memory is possible too. Need to get better at pulling logs off as soon as I notice an issue :)

RedDogUK commented 1 year ago

The Enviro Weather board doesn't like firmware v0.0.8. It constantly crashes. Not sure what the cause is. Have tried fresh versions of the firmware and ran it on 2 boards - outcome was the same.

peter-mount commented 1 year ago

My urban board crashed again in the early hours of Sunday (UK time) and I only just got around to restarting it.

I've manually added the change in #84 to see if that is related to this, as I use MQTT.

Somehow I don't think so as when I put it back on it's dedicated power (rather than USB) it was stuck until I pressed reset a couple of times.

tbbuck commented 1 year ago

Interesting. I haven't had any issues since adding a timeout for the HTTP endpoint, though I think it may be a coincidence.

Starting to wonder whether it needs a debug-heavy version that can be switched on and dump nearly every operation. Obviously it's a complex piece of kit with a tonne of edge cases to be figured out :)

MrDrem commented 1 year ago

I've been adding logging in my personal fork following my request in #68. I've done it for the main.py and the weather.py files so far, but I've not got them stable yet just got them working. Happy to look at others if I have people that would like to test them once I've added the extra bits, it's not that hard to do.

peter-mount commented 1 year ago

Ok including that change for MQTT enabled it to last almost exactly 24 hours before it then crashed. A bit of progress

ZodiusInfuser commented 1 year ago

There is a new release that may fix this issue: https://github.com/pimoroni/enviro/releases/tag/v0.0.9 I'll therefore close this but please re-open it or raise a new issue if you update and experience the problem again. Thanks