pimoroni / enviro

MIT License
110 stars 86 forks source link

0.0.9 - Enviro Urban - Exception while uploading - Caused board to hang fro 2hrs #119

Open dave-ct opened 2 years ago

dave-ct commented 2 years ago

@ZodiusInfuser have tried 0.0.9 on the Urban today (also on 2 x enviro indoors which dont seem to have any issues so far). On the Urban it worked for about 4hrs then stopped. Looking at the logs an error occured while uploading vi mqtt which then tried to go back to sleep but looks like it failed (last entry at 15:20) as the red LED was flashing probably for about two hours and this was at about 17:00. When I bought it in took a picture before disconnecting the battery as also seems the white LED is partially on.

Log Files

2022-11-26 15:05:16 [info     / 118kB] > uploading cached readings to MQTT broker: atmos.local
2022-11-26 15:05:16 [info     / 110kB]   - uploaded 2022-11-26T14_55_09Z.json
2022-11-26 15:05:17 [info     / 103kB]   - uploaded 2022-11-26T15_00_09Z.json
2022-11-26 15:05:17 [info     /  97kB]   - uploaded 2022-11-26T15_05_09Z.json
2022-11-26 15:05:17 [info     /  95kB] > going to sleep
2022-11-26 15:05:17 [debug    /  93kB]   - clearing and disabling previous alarm
2022-11-26 15:05:17 [info     /  90kB]   - setting alarm to wake at 15:10pm
2022-11-26 15:05:17 [info     /  88kB]   - shutting down
2022-11-26 15:10:03 [debug    / 115kB] > performing startup
2022-11-26 15:10:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:10:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:10:03 [debug    / 119kB] > 102 blocks free out of 212
2022-11-26 15:10:03 [debug    / 116kB] > taking new reading
2022-11-26 15:10:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:10:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:10:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:10:08 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:10:08 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:10:09 [debug    / 122kB] > caching reading for upload
2022-11-26 15:10:09 [info     / 117kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-11-26 15:10:09 [info     / 115kB] > going to sleep
2022-11-26 15:10:09 [debug    / 112kB]   - clearing and disabling previous alarm
2022-11-26 15:10:09 [info     / 110kB]   - setting alarm to wake at 15:15pm
2022-11-26 15:10:10 [info     /  94kB]   - shutting down
2022-11-26 15:15:03 [debug    / 115kB] > performing startup
2022-11-26 15:15:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:15:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:15:03 [debug    / 119kB] > 103 blocks free out of 212
2022-11-26 15:15:03 [debug    / 116kB] > taking new reading
2022-11-26 15:15:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:15:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:15:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:15:09 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:15:09 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:15:09 [debug    / 120kB] > caching reading for upload
2022-11-26 15:15:09 [info     / 115kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-11-26 15:15:09 [info     / 113kB] > going to sleep
2022-11-26 15:15:09 [debug    / 111kB]   - clearing and disabling previous alarm
2022-11-26 15:15:10 [info     / 109kB]   - setting alarm to wake at 15:20pm
2022-11-26 15:15:10 [info     / 107kB]   - shutting down
2022-11-26 15:20:03 [debug    / 115kB] > performing startup
2022-11-26 15:20:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:20:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:20:03 [debug    / 119kB] > 102 blocks free out of 212
2022-11-26 15:20:03 [debug    / 116kB] > taking new reading
2022-11-26 15:20:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:20:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:20:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:20:08 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:20:08 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:20:09 [debug    / 121kB] > caching reading for upload
2022-11-26 15:20:09 [info     / 116kB] > 3 cache file(s) need uploading
2022-11-26 15:20:09 [info     / 113kB] > connecting to wifi network 'VM2878914'
2022-11-26 15:20:15 [info     /  91kB]   - ip address:  192.168.0.73
2022-11-26 15:20:16 [info     / 118kB] > uploading cached readings to MQTT broker: atmos.local
2022-11-26 15:20:23 [debug    /  88kB]   - an exception occurred when uploading
2022-11-26 15:20:23 [error    /  86kB]   ! failed to upload '2022-11-26T15_10_09Z.json' to mqtt
2022-11-26 15:20:23 [error    /  84kB] ! reading upload failed
2022-11-26 15:20:23 [info     / 121kB] > going to sleep
2000-01-01 00:00:04 [debug    / 115kB] > performing startup
2000-01-01 00:00:04 [info     / 123kB]   - wake reason: usb_powered
2000-01-01 00:00:04 [debug    / 121kB]   - turn on activity led
2000-01-01 00:00:04 [info     / 119kB] > clock not set, synchronise from ntp server

Photo of board LEDs when I bought it in (red LED was flashing and white LED was part lit).

IMG_3167

ZodiusInfuser commented 2 years ago

Thank you for raising this. It looks like it got stuck during the 2022-11-26 15:20:23 [info / 121kB] > going to sleep phase. As such, those two LEDs stayed on which probably drained your battery, hence the logs once you plugged USB in having the default time.

Why it got stuck during that phase, I do not know yet. I'll fire my own Enviro Urban up and see if I can reproduce.

dave-ct commented 2 years ago

Ah, the logs time reset to zero as I unplugged the battery before connecting to USB to view the logs.

ZodiusInfuser commented 2 years ago

I had my own Enviro Urban lock up overnight (solid white light, but no red light). Unfortunately I needed it to be logging again so just reset it back to life. Hopefully the issue with appear again.

ZodiusInfuser commented 1 year ago

Just to update that I am still investigating this. It's a hard one to diagnose, as it's so infrequent, even when logging every minute.

Best I can gather though, it's not an issue with the board going to sleep, but rather the board waking up. We do some shenanigans during bootup before Micropython kicks in, and somewhere in that is the issue. Likely in this file, if you're curious https://github.com/pimoroni/pimoroni-pico/blob/main/micropython/_board/picow_enviro/wakeup_gpio.patch

TechWilk commented 1 year ago

I too had a solid white light on enviro indoor a day or two back, but after power cycling several hours later (having spotted the issue) the board has been working fine since. I didn't get around to saving the logs before they rolled over, but sounds related.

I'm using a custom webhook rather than mqtt, if that matters.

ZodiusInfuser commented 1 year ago

@TechWilk Thanks for the confirmation. I have not experienced this in Enviro Indoor myself yet, which has been running fine for over a week. Only issue I had was the RTC slowly going out of sync, causing an issue with Adafruit.io uploads.

This makes me wonder if it's the underlying micropython version at fault, as my Enviro Indoor and Weather are using v1.19.7 as their base, with the v0.0.9 enviro firmware onboard. Whereas my Enviro Grow and Urban are v1.19.10 with v0.0.9. 🤔

dave-ct commented 1 year ago

Have been to a little experiment (2 x urbans and 2 indoors) all running at readings every 2 minutes and uploading evey 3 readings. I wanted to see how the readings varied from them when they are all in the same conditions. After about 3 hrs one urban has experienced the issue.

IMG-3183

Again has the red light and white LED on. Note this is a differnet urban to the original issue that was raised. To get it to work again had to remove power and the seems OK again. Will continue to monitor.

IMG-3192

They all running indentical 0.0.9 https://github.com/pimoroni/enviro/releases/download/v0.0.9/pimoroni-picow_enviro-v1.19.10-micropython-v0.0.9.uf2

The only thing differnet to v0.0.9 is I am sending over secure mqtt #122 on these ones but that should not have any impact.

Logs File see events at 14:48:

2022-12-03 14:42:03 [debug    / 115kB] > performing startup
2022-12-03 14:42:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:42:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:42:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:42:03 [debug    / 116kB] > taking new reading
2022-12-03 14:42:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:42:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:42:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:42:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-03 14:42:09 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:42:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:42:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-03 14:42:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-03 14:42:14 [info     /  94kB]   - ip address:  192.168.0.238
2022-12-03 14:42:14 [info     / 118kB] > uploading cached readings to MQTT broker: <removed>
2022-12-03 14:42:23 [info     / 103kB]   - uploaded 2022-12-03T14_38_09Z.json
2022-12-03 14:42:32 [info     / 113kB]   - uploaded 2022-12-03T14_40_09Z.json
2022-12-03 14:42:42 [info     / 102kB]   - uploaded 2022-12-03T14_42_09Z.json
2022-12-03 14:42:42 [info     / 100kB] > going to sleep
2022-12-03 14:42:42 [debug    /  97kB]   - clearing and disabling previous alarm
2022-12-03 14:42:42 [info     /  95kB]   - setting alarm to wake at 14:44pm
2022-12-03 14:42:42 [info     /  93kB]   - shutting down
2022-12-03 14:44:03 [debug    / 115kB] > performing startup
2022-12-03 14:44:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:44:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:44:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-03 14:44:03 [debug    / 116kB] > taking new reading
2022-12-03 14:44:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:44:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:44:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:44:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-03 14:44:09 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:44:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:44:09 [info     / 114kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-03 14:44:09 [info     / 112kB] > going to sleep
2022-12-03 14:44:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-03 14:44:09 [info     / 107kB]   - setting alarm to wake at 14:46pm
2022-12-03 14:44:10 [info     / 105kB]   - shutting down
2022-12-03 14:46:03 [debug    / 115kB] > performing startup
2022-12-03 14:46:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:46:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:46:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:46:03 [debug    / 116kB] > taking new reading
2022-12-03 14:46:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:46:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:46:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:46:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-03 14:46:08 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:46:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:46:09 [info     / 114kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-03 14:46:09 [info     / 112kB] > going to sleep
2022-12-03 14:46:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-03 14:46:09 [info     / 108kB]   - setting alarm to wake at 14:48pm
2022-12-03 14:46:09 [info     / 105kB]   - shutting down
2022-12-03 14:48:03 [debug    / 115kB] > performing startup
2022-12-03 14:48:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:48:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:48:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:48:03 [debug    / 116kB] > taking new reading
2022-12-03 14:48:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:48:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:48:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:48:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-03 14:48:08 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:48:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:48:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-03 14:48:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-03 14:48:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-03 14:48:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-03 14:48:34 [debug    /  75kB]   - an exception occurred when uploading
2022-12-03 14:48:35 [error    /  73kB]   ! failed to upload '2022-12-03T14_44_09Z.json' to mqtt
2022-12-03 14:48:35 [error    /  71kB] ! reading upload failed
2022-12-03 14:48:35 [info     /  69kB] > going to sleep
dave-ct commented 1 year ago

This moring had 1 x indoor and 1 x urban in a failed state with the red and white light on.

IMG-3195

For the Urban it looks like once one of the uploads failed it stopped, from the logs doe snot look like it even tried to go back to sleep, see entries at 09:08. AT 10:25 was when I removed the power and plugged in to USB to get logs:

 2022-12-04 09:06:09 [info     / 112kB] > going to sleep
2022-12-04 09:06:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-04 09:06:09 [info     / 107kB]   - setting alarm to wake at 09:08am
2022-12-04 09:06:10 [info     / 105kB]   - shutting down
2022-12-04 09:08:03 [debug    / 115kB] > performing startup
2022-12-04 09:08:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 09:08:03 [debug    / 121kB]   - turn on activity led
2022-12-04 09:08:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-04 09:08:03 [debug    / 116kB] > taking new reading
2022-12-04 09:08:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 09:08:03 [debug    / 109kB]   - starting sensor
2022-12-04 09:08:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 09:08:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-04 09:08:08 [debug    / 120kB]   - taking microphone reading
2022-12-04 09:08:09 [debug    / 119kB] > caching reading for upload
2022-12-04 09:08:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-04 09:08:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-04 09:08:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-04 09:08:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 09:08:21 [info     / 109kB]   - uploaded 2022-12-04T09_04_09Z.json
2022-12-04 09:08:44 [info     / 106kB]   - uploaded 2022-12-04T09_06_09Z.json
2022-12-04 09:08:58 [debug    /  96kB]   - an exception occurred when uploading
2022-12-04 09:08:58 [error    /  94kB]   ! failed to upload '2022-12-04T09_08_09Z.json' to mqtt
2022-12-04 09:08:58 [error    /  92kB] ! reading upload failed
2022-12-04 10:25:33 [debug    / 115kB] > performing startup
2022-12-04 10:25:33 [info     / 123kB]   - wake reason: usb_powered
2022-12-04 10:25:33 [debug    / 121kB]   - turn on activity led
2022-12-04 10:25:33 [debug    / 118kB] > 99 blocks free out of 212 

For the Indoor the logs are slightly differnet but again had to be powered off to get the logs and kick it back into life. See time 06:00 but this one did appear to try to go to sleep after a uplaod failure, the entries after that are when I powered off and plugged back into USB to get the logs.

2022-12-04 05:58:03 [debug    / 115kB] > performing startup
2022-12-04 05:58:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 05:58:03 [debug    / 121kB]   - turn on activity led
2022-12-04 05:58:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-04 05:58:03 [debug    / 117kB] > taking new reading
2022-12-04 05:58:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 05:58:04 [debug    / 107kB] > caching reading for upload
2022-12-04 05:58:04 [info     / 102kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-04 05:58:04 [info     / 100kB] > going to sleep
2022-12-04 05:58:04 [debug    /  98kB]   - clearing and disabling previous alarm
2022-12-04 05:58:04 [info     /  95kB]   - setting alarm to wake at 06:00am
2022-12-04 05:58:04 [info     /  93kB]   - shutting down
2022-12-04 06:00:03 [debug    / 115kB] > performing startup
2022-12-04 06:00:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 06:00:03 [debug    / 121kB]   - turn on activity led
2022-12-04 06:00:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-04 06:00:03 [debug    / 117kB] > taking new reading
2022-12-04 06:00:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 06:00:04 [debug    / 107kB] > caching reading for upload
2022-12-04 06:00:04 [info     / 102kB] > 3 cache file(s) need uploading
2022-12-04 06:00:04 [info     / 100kB] > connecting to wifi network '<removed>'
2022-12-04 06:00:10 [info     /  96kB]   - ip address:  192.168.0.181
2022-12-04 06:00:11 [info     / 119kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 06:00:18 [debug    /  88kB]   - an exception occurred when uploading
2022-12-04 06:00:18 [error    /  86kB]   ! failed to upload '2022-12-04T05_56_04Z.json' to mqtt
2022-12-04 06:00:18 [error    /  84kB] ! reading upload failed
2022-12-04 06:00:18 [info     /  82kB] > going to sleep
2022-12-04 06:00:18 [debug    /  80kB]   - clearing and disabling previous alarm
2022-12-04 06:00:18 [info     /  78kB]   - setting alarm to wake at 06:02am
2000-01-01 00:00:05 [debug    / 115kB] > performing startup
2000-01-01 00:00:05 [info     / 123kB]   - wake reason: usb_powered
2000-01-01 00:00:05 [debug    / 121kB]   - turn on activity led

The common theme seems to be that it only occurs after an upload failure. So perhaps might be able to simulate by disabling the remote destination (turn mqtt f) and see if it forces the error. On the logs for both also never shows a "shutting down" down entry so so not sure if it due to not waking up or it is just not shutting down properly after a upload failure.

Note that until these failures they had been uploading fine taking readings evey 2 minutes and uploading every 3 readings.

dave-ct commented 1 year ago

Indoor & urban had the same issue again today:

IMG-3207

For the Indoor this may be down to the battery level this time which was showing about 4% with my power reader but volatage should still be in tolerance. Urban had 84% battery so dont think be a reason this time,.

IMG-3208

Indoor Logs, see 05:55. Last log entry is when i disconnected battery and reconnected on USB to get logs.

2022-12-05 05:52:03 [debug    / 115kB] > performing startup
2022-12-05 05:52:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 05:52:03 [debug    / 121kB]   - turn on activity led
2022-12-05 05:52:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-05 05:52:03 [debug    / 117kB] > taking new reading
2022-12-05 05:52:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 05:52:04 [debug    / 107kB] > caching reading for upload
2022-12-05 05:52:04 [info     / 102kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 05:52:04 [info     / 100kB] > going to sleep
2022-12-05 05:52:04 [debug    /  98kB]   - clearing and disabling previous alarm
2022-12-05 05:52:04 [info     /  96kB]   - setting alarm to wake at 05:54am
2022-12-05 05:52:04 [info     /  94kB]   - shutting down
2022-12-05 05:54:03 [debug    / 115kB] > performing startup
2022-12-05 05:54:03 [info     / 108kB]   - wake reason: rtc_alarm
2022-12-05 05:54:03 [debug    / 106kB]   - turn on activity led
2022-12-05 05:54:03 [debug    / 104kB] > 100 blocks free out of 212
2022-12-05 05:54:03 [debug    / 102kB] > taking new reading
2022-12-05 05:54:04 [info     /  97kB]   - seconds since last reading: 121
2022-12-05 05:54:04 [debug    /  92kB] > caching reading for upload
2022-12-05 05:54:04 [info     / 122kB] > 3 cache file(s) need uploading
2022-12-05 05:54:04 [info     / 120kB] > connecting to wifi network <removed>
2022-12-05 05:54:10 [info     /  97kB]   - ip address:  192.168.0.181
2022-12-05 05:54:11 [info     / 119kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 05:54:16 [info     / 109kB]   - uploaded 2022-12-05T05_50_04Z.json
2022-12-05 05:55:48 [debug    /  50kB]   - an exception occurred when uploading
2022-12-05 05:55:48 [error    /  48kB]   ! failed to upload '2022-12-05T05_52_04Z.json' to mqtt
2022-12-05 05:55:48 [error    /  46kB] ! reading upload failed
2022-12-05 05:55:48 [info     /  44kB] > going to sleep
2022-12-05 05:55:48 [debug    /  42kB]   - clearing and disabling previous alarm
2000-01-01 00:00:02 [debug    / 115kB] > performing startup

Urban logs, see 06:48. Have included the previosu logs so can see it worked before. Last log entry is when i disconnected battery and reconnected on USB to get logs.

2022-12-05 06:42:03 [debug    / 115kB] > performing startup
2022-12-05 06:42:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:42:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:42:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:42:03 [debug    / 116kB] > taking new reading
2022-12-05 06:42:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 06:42:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:42:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 06:42:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:42:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:42:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:42:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 06:42:09 [info     / 112kB] > connecting to wifi network <removed>
2022-12-05 06:42:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 06:42:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 06:42:24 [info     / 101kB]   - uploaded 2022-12-05T06_38_09Z.json
2022-12-05 06:42:29 [info     / 111kB]   - uploaded 2022-12-05T06_40_10Z.json
2022-12-05 06:42:33 [info     / 109kB]   - uploaded 2022-12-05T06_42_09Z.json
2022-12-05 06:42:33 [info     / 107kB] > going to sleep
2022-12-05 06:42:33 [debug    / 105kB]   - clearing and disabling previous alarm
2022-12-05 06:42:34 [info     / 102kB]   - setting alarm to wake at 06:44am
2022-12-05 06:42:34 [info     / 100kB]   - shutting down
2022-12-05 06:44:03 [debug    / 115kB] > performing startup
2022-12-05 06:44:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:44:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:44:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:44:03 [debug    / 116kB] > taking new reading
2022-12-05 06:44:03 [info     / 111kB]   - seconds since last reading: 120
2022-12-05 06:44:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:44:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 06:44:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:44:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:44:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:44:09 [info     / 114kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 06:44:09 [info     / 112kB] > going to sleep
2022-12-05 06:44:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-05 06:44:09 [info     / 108kB]   - setting alarm to wake at 06:46am
2022-12-05 06:44:10 [info     / 105kB]   - shutting down
2022-12-05 06:46:03 [debug    / 115kB] > performing startup
2022-12-05 06:46:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:46:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:46:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:46:03 [debug    / 116kB] > taking new reading
2022-12-05 06:46:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 06:46:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:46:04 [debug    /  92kB]   - wait 5 seconds for airflow
2022-12-05 06:46:09 [debug    / 106kB]   - taking pms5003i reading
2022-12-05 06:46:09 [debug    / 104kB]   - taking microphone reading
2022-12-05 06:46:09 [debug    / 102kB] > caching reading for upload
2022-12-05 06:46:09 [info     /  97kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 06:46:10 [info     /  95kB] > going to sleep
2022-12-05 06:46:10 [debug    /  93kB]   - clearing and disabling previous alarm
2022-12-05 06:46:10 [info     /  90kB]   - setting alarm to wake at 06:48am
2022-12-05 06:46:10 [info     /  88kB]   - shutting down
2022-12-05 06:48:03 [debug    / 115kB] > performing startup
2022-12-05 06:48:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:48:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:48:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-05 06:48:03 [debug    / 116kB] > taking new reading
2022-12-05 06:48:03 [info     / 111kB]   - seconds since last reading: 120
2022-12-05 06:48:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:48:03 [debug    / 106kB]   - wait 5 seconds for airflow
2022-12-05 06:48:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:48:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:48:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:48:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 06:48:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-05 06:48:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 06:48:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 06:48:35 [debug    /  74kB]   - an exception occurred when uploading
2022-12-05 06:48:35 [error    /  72kB]   ! failed to upload '2022-12-05T06_44_09Z.json' to mqtt
2022-12-05 06:48:35 [error    /  70kB] ! reading upload failed
2000-01-01 00:00:04 [debug    / 115kB] > performing startup
ZodiusInfuser commented 1 year ago

Hi @dave-ct, Thank you so much for this thorough testing!

I can clearly see there being some correlation between the mqtt upload failing and the boards locking up, and it does seem to be during shutdown rather than startup (meaning this may be is a different issue to the one I and others have reported, which is definitely during startup).

I will have a look in the code and see if there is something specific to do with uploads failing that would cause this kind of lock-up. Hopefully I can reproduce without setting up an mqtt server.

dave-ct commented 1 year ago

@ZodiusInfuser if there is any extra logging/tests you want me to try let me know. Both Urbans have failed today just now with the same issue on after a upload failure, see another example below at 11:24 and then connected to USB at 12:04. .

2022-12-05 11:24:03 [debug    / 115kB] > performing startup
2022-12-05 11:24:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 11:24:03 [debug    / 121kB]   - turn on activity led
2022-12-05 11:24:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-05 11:24:03 [debug    / 116kB] > taking new reading
2022-12-05 11:24:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 11:24:03 [debug    / 109kB]   - starting sensor
2022-12-05 11:24:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 11:24:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 11:24:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 11:24:09 [debug    / 119kB] > caching reading for upload
2022-12-05 11:24:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 11:24:09 [info     / 111kB] > connecting to wifi network '<removed>'
2022-12-05 11:24:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 11:24:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 11:24:27 [info     / 101kB]   - uploaded 2022-12-05T11_20_09Z.json
2022-12-05 11:24:39 [debug    / 100kB]   - an exception occurred when uploading
2022-12-05 11:24:39 [error    /  98kB]   ! failed to upload '2022-12-05T11_22_09Z.json' to mqtt
2022-12-05 11:24:39 [error    /  96kB] ! reading upload failed
2022-12-05 11:24:39 [info     /  94kB] > going to sleep
2022-12-05 11:24:39 [debug    /  92kB]   - clearing and disabling previous alarm
2022-12-05 12:04:27 [debug    / 115kB] > performing startup
ZodiusInfuser commented 1 year ago

@dave-ct to help with debugging, could you update the local mqtt.py file on your boards from this:

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]

  try:
    # attempt to publish reading
    mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
    mqtt_client.connect()
    mqtt_client.publish(f"enviro/{nickname}", ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS
  except:
    logging.debug(f"  - an exception occurred when uploading")

  return UPLOAD_FAILED

to this:

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]

  try:
    # attempt to publish reading
    mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
    mqtt_client.connect()
    mqtt_client.publish(f"enviro/{nickname}", ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS
  except Exception as exc:
    import sys, io
    buf = io.StringIO()
    sys.print_exception(exc, buf)
    logging.debug(f"  - an exception occurred when uploading.", buf.getvalue())

  return UPLOAD_FAILED

This will log the actual exception that occurs when the MQTT broker fails.

Meanwhile I have set up an Enviro Urban to log and send data to a MQTT broker that doesn't exist, to see if I can get some reproduction. This always causes an exception to occur (red flashing light), but is perhaps not the same exception you are experiencing, hence the need for extra logging.

dave-ct commented 1 year ago

@ZodiusInfuser Just wanted to confirm, its is nothing to do with the secure mqtt I have setup as my urban which is physically outside also failed today and this just uses the default mqtt destination with no changes. Seems like the same issue at 23:05.

2022-12-04 23:00:03 [debug    / 115kB] > performing startup
2022-12-04 23:00:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 23:00:03 [debug    / 121kB]   - turn on activity led
2022-12-04 23:00:03 [debug    / 119kB] > 100 blocks free out of 212
2022-12-04 23:00:03 [debug    / 116kB] > taking new reading
2022-12-04 23:00:03 [info     / 112kB]   - seconds since last reading: 300
2022-12-04 23:00:03 [debug    / 109kB]   - starting sensor
2022-12-04 23:00:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 23:00:08 [debug    /  88kB]   - taking pms5003i reading
2022-12-04 23:00:08 [debug    / 121kB]   - taking microphone reading
2022-12-04 23:00:09 [debug    / 122kB] > caching reading for upload
2022-12-04 23:00:09 [info     / 117kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-04 23:00:09 [info     / 115kB] > going to sleep
2022-12-04 23:00:09 [debug    / 112kB]   - clearing and disabling previous alarm
2022-12-04 23:00:09 [info     / 110kB]   - setting alarm to wake at 23:05pm
2022-12-04 23:00:09 [info     / 108kB]   - shutting down
2022-12-04 23:05:03 [debug    / 115kB] > performing startup
2022-12-04 23:05:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 23:05:03 [debug    / 121kB]   - turn on activity led
2022-12-04 23:05:03 [debug    / 119kB] > 100 blocks free out of 212
2022-12-04 23:05:03 [debug    / 116kB] > taking new reading
2022-12-04 23:05:03 [info     / 112kB]   - seconds since last reading: 300
2022-12-04 23:05:03 [debug    / 109kB]   - starting sensor
2022-12-04 23:05:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 23:05:08 [debug    /  88kB]   - taking pms5003i reading
2022-12-04 23:05:08 [debug    / 121kB]   - taking microphone reading
2022-12-04 23:05:09 [debug    / 121kB] > caching reading for upload
2022-12-04 23:05:09 [info     / 116kB] > 3 cache file(s) need uploading
2022-12-04 23:05:09 [info     / 114kB] > connecting to wifi network '<removed>'
2022-12-04 23:05:15 [info     /  91kB]   - ip address:  192.168.0.73
2022-12-04 23:05:16 [info     / 118kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 23:05:35 [debug    /  85kB]   - an exception occurred when uploading
2022-12-04 23:05:35 [error    /  83kB]   ! failed to upload '2022-12-04T22_55_09Z.json' to mqtt
2022-12-04 23:05:35 [error    /  81kB] ! reading upload failed
2022-12-05 12:18:12 [debug    / 115kB] > performing startup
2022-12-05 12:18:12 [info     / 123kB]   - wake reason: usb_powered
2022-12-05 12:18:12 [debug    / 121kB]   - turn on activity led
2022-12-05 12:18:12 [debug    / 118kB] > 101 blocks free out of 212
2022-12-05 12:18:12 [debug    / 116kB] > taking new reading
2022-12-05 12:18:12 [info     / 112kB]   - seconds since last reading: 47589
2022-12-05 12:18:12 [debug    / 109kB]   - starting sensor
2022-12-05 12:18:12 [debug    / 107kB]   - wait 5 seconds for airflow

I will add the extra logging you mentioned above on the urbans and indoors to see if I can gte more details. Altough sometimes it seems lik eit gets a bit further than othe rother times based on the log messages.

ZodiusInfuser commented 1 year ago

Thanks for that. The fact it fails at slightly different points after the exception could be a red herring, in that it's either still the exception at fault, or it's not the exception at all but some other factor. If the latter, then I have no idea how to begin diagnosing that! 😢

dave-ct commented 1 year ago

@ZodiusInfuser as these four boards all need ot use secure mqtt have modfied it slightly but should captcher the results as per the logging you suggested.

This is the mqtt.py I am using with the extra logging added in.

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]

  try:
    # attempt to publish reading

    ##### SSL Change Start
    # two options, with or without SSL
    if config.mqtt_broker_ca_file:
      f = open("ca.crt")
      ssl_data = f.read()
      f.close()
      mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60,
                               ssl=True, ssl_params={'cert': ssl_data})
      print('Using SSL')
    else:
    ##### SSL Change finish so do what the non SSL version does instead
      mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
      print('not ssl')
    mqtt_client.connect()
    mqtt_client.publish('envirotest', ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS

  except Exception as exc:
    import sys, io
    buf = io.StringIO()
    sys.print_exception(exc, buf)
    logging.debug(f"  - an exception occurred when uploading.", buf.getvalue())

  return UPLOAD_FAILED
ZodiusInfuser commented 1 year ago

Ah of course, thank you. By the way, for that SSL change, if you have tested it and are confident it works, could you raise a Pull Request for it please? There's a few things that may need tweaking, but that can be discussed in the PR

ZodiusInfuser commented 1 year ago

I've just had the solid light issue (with flashing red led) occur. Best I can tell from the logs it's the same startup issue I was already aware of as there were going to sleep messages, whereas yous logs clearly show that not happening.

dave-ct commented 1 year ago

@ZodiusInfuser Just got a failure on an urban, did work a few time before hand:

2022-12-05 13:06:03 [debug    / 115kB] > performing startup
2022-12-05 13:06:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 13:06:03 [debug    / 121kB]   - turn on activity led
2022-12-05 13:06:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 13:06:03 [debug    / 116kB] > taking new reading
2022-12-05 13:06:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 13:06:03 [debug    / 109kB]   - starting sensor
2022-12-05 13:06:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 13:06:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 13:06:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 13:06:09 [debug    / 119kB] > caching reading for upload
2022-12-05 13:06:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 13:06:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-05 13:06:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-05 13:06:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 13:06:35 [debug    /  73kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 32, in upload_reading
  File "enviro/mqttsimple.py", line 70, in connect
OSError: [Errno 103] ECONNABORTED

2022-12-05 13:06:35 [error    /  70kB]   ! failed to upload '2022-12-05T13_02_09Z.json' to mqtt
2022-12-05 13:06:35 [error    / 121kB] ! reading upload failed

Is it possible to do a build with the micropython version from 0.0.8 with the 0.0.9 firmware. Might proove if its the updated micropython version.

ZodiusInfuser commented 1 year ago

Thanks for that. I'll have a dig and see what that error means.

In the meantime, yes you can try different micropython versions. Easiest way is to install the v0.0.8 micropython using this https://github.com/pimoroni/enviro/releases/download/v0.0.8/enviro-v0.0.8.uf2

Then put the v0.0.9 FW on using this https://github.com/pimoroni/enviro/releases/download/v0.0.9/enviro-v0.0.9-filesystem-only.uf2.

Save a local copy of your config and mqtt files first though

dave-ct commented 1 year ago

Have updated 2 x Urbans and 2 x indoors to the 0.0.8 miropython with the 0.0.9 Firmware using the above method, restored the saved config/mqtt/ca.crt files fro each one and will monitor and see how they get on......

ZodiusInfuser commented 1 year ago

Thanks @dave-ct. It will be interesting to hear.

By the way, I noticed that in the v0.0.9 mqttsimple.py file there used to be a timeout set but it has since been commented out. It could be worth re-adding that as a second step.

    def connect(self, clean_session=True):
    #def connect(self, clean_session=True, timeout=30): # TODO this was added to 0.0.8
        self.sock = socket.socket()
        #self.sock.settimeout(timeout) # TODO this was added to 0.0.8
        addr = socket.getaddrinfo(self.server, self.port)[0][-1]

This may resolve the ECONNABORTED exception, which the internet suggests is caused by the server rejecting the connection, or at least change it into an ECONNTIMEDOUT exception.

In either case, I should update the logic for mqtt.py so it checks for that specific exception and performs the same "reattempt upload" that I added to adafruit.io.

ZodiusInfuser commented 1 year ago

@dave-ct I just wanted to let you know that I too tested with v0.0.8's Micropython and the v0.0.9 firmware, but still had the issue occur. Also this is the first time I have your exact issue of needing to remove power to the board in order to fix things.

I have left a board at the office in this error state so I can diagnose it further, but let me explain some of my understanding thus far. Apologies, it's rather long as I've kind of used this to air my own thoughts on what is going on.


When Enviro is on battery and not doing anything, it is in (lets call it) "super sleep", where the Pico W literally has it's 3.3V power supply disabled. For all intents and purposes it is offline in this mode, with the only things getting power on Enviro being the RTC and user button. When either the RTC triggers its alarm, the user button is poked, or USB power is provided, the Pico W receives 3.3V and starts booting, turning the white LED on, setting HOLD_VSYS_EN to high keep the board alive, and disabling the RTC_ALARM (and this red flashing light). When Enviro it wants to go to back to "super sleep" it sets HOLD_VSYS_EN to low, and as long as there's no external source preventing it (such as USB power), it's 3.3V power gets deactivated.

The situation you (and others) are experiencing is a lockup during boot (we'll get to possible reasons later), sometime after the white LED is enabled but before Micropython has fully loaded, where the white LED would start pulsing. This point is likely before the RTC_ALARM has been reset, as indicated by the red light still flashing on your boards. That means that 3.3V power is constantly provided to the Pico W. This is significant because normally on Enviro, when reset is pressed whilst the board is awake, it will cause the board to turn off, requiring a "Poke" to start it running again. What we are now both experiencing is the reset button just resetting the Pico W. Normally this would be fine, but the white LED just turns on and nothing happens. This should not be the case with a board reset, so there must be something persistently stalling the boot process of the RP2040.

Incidentally this explains why you need to remove power from the board, as that is the only way to fully reset the state of everything on the board.

For Enviro Urban the things this could be are: the RTC, the BME280, the analog microphone, or the Pico W's WiFi chip itself. I'm yet to rule any of these options, but the fact you are getting upload failed exceptions right before these boot lockup's could be an indication of the WiFi chip getting into a weird state that either causes an immediate lockup (as some of your logs show) or causes the lock-up during boot. Note, this is irrespective of if you use the WiFi in your program, as I've had enviro's that locally log also lock-up.

If it is indeed the WiFi causing these lock-ups, then that's a much deeper problem to solve that ventures outside of the Enviro hardware and software and over to Raspberry Pi themselves.


So that's where we're at... I wish I had an easy answer for you, but it's either one of those other internal components of Enviro that is causing the issue or the Pico W's WiFi itself. I will delve deeper over the next few days, and see if there's been any general Pico W reports that corroborate this theory.

dave-ct commented 1 year ago

@ZodiusInfuser , thanks for the detail and efforts to find the solution. Since using v0.0.8's Micropython and the v0.0.9 firmware I have had no issues so far on any of the devices, even when there have been upload failures it recoveres itself when it wakes up next. I will monitor over the next couple of days as previosuly with the v0.0.9 Micropython is was happening regularly.

ZodiusInfuser commented 1 year ago

Interesting. Perhaps my method of flashing my boards wasn't optimal. After all, I too have had an Enviro Weather and Indoor running for a few weeks now and they had v0.0.8's Micropython with the v0.0.9 (pre-release) Firmware on 🤔

Actually. Let me try it properly and get the system to generate a build with the right MP (v1.19.7) and v0.0.9 firmware 😆 : https://github.com/pimoroni/enviro/actions/runs/3624876714

dave-ct commented 1 year ago

No issues this morning, will also try the system build above later tonight to make sure.

ZodiusInfuser commented 1 year ago

Thanks for the update. I've loaded an Enviro grow up with the firmware, so will leave that running through the work day. If it is indeed the difference between 1.19.7 and 1.19.10 that's to blame then that at least gives me a direction to explore with my debugging.

ZodiusInfuser commented 1 year ago

Putting this here for my own reference. It looks like there is nothing of note different between v1.19.7 and v1.19.10 that would explain the lock-ups 😕 https://github.com/pimoroni/pimoroni-pico/compare/1.19.7...v1.19.10

dave-ct commented 1 year ago

@ZodiusInfuser could there be any changes in the upstream version of micropython that the Pimornoi build uses. Not sure how it works but does it incoporate the offical micropython release from the 18 June (1.19.1) or does it grab latest updates to core micropython or one of the nightly builds which might have some changes in it?

ZodiusInfuser commented 1 year ago

That was the first thing I looked for, but the commit code we're using hasn't changed between our release versions: https://github.com/pimoroni/pimoroni-pico/blob/b1e8ed086400cfe8f471fd9e35577cfc3b12d4e9/.github/workflows/micropython-picow.yml#L10

From what I recall, 1.19.1 predates the Pico W, so we are using a newer build (but not the absolute latest). We would prefer to use a stable numbered release, but there has yet to be one. The last time we updated was for our 1.19.6.


An update on my Enviro Grow. It has locked up several times today. Both times pressing the reset button recovered it from the state, so whatever happened with the Urban is seriously bad.

I tried looking at the Urban with a scope to see if there was any merit to the theory of WiFi comms locking things up, but did not see any activity on the lines. So instead I tried scoping one of the Pico W's flash chip lines and it consistently fails at the same point.

Normal Enviro during boot: image

Locked Enviro Urban during boot: image

Don't know how useful this information is yet, so I'll need to correlate this data with different points in the code. Or try and get in touch with someone at Pi to help as this is starting to get beyond me :sad:

dave-ct commented 1 year ago

Just an a bit more using the weather

Had a Weather running on the original release 0.0.9 build with the extra mqtt logging adding https://github.com/pimoroni/enviro/releases/download/v0.0.9/pimoroni-picow_enviro-v1.19.10-micropython-v0.0.9.uf2 and it got the red and white error. It was fixed with the rest button press but looks like it got a different mqtt error and then never fully completed the shiudown (no final -shutting down entry in the log):

2022-12-06 10:36:03 [info     / 119kB]   - wake reason: rtc_alarm
2022-12-06 10:36:03 [debug    / 117kB]   - turn on activity led
2022-12-06 10:36:03 [debug    / 115kB] > 99 blocks free out of 212
2022-12-06 10:36:03 [debug    / 113kB] > taking new reading
2022-12-06 10:36:03 [info     / 108kB]   - seconds since last reading: 120
2022-12-06 10:36:04 [debug    / 100kB] > caching reading for upload
2022-12-06 10:36:05 [info     /  95kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-06 10:36:05 [info     / 120kB] > going to sleep
2022-12-06 10:36:05 [debug    / 118kB]   - clearing and disabling previous alarm
2022-12-06 10:36:05 [info     / 115kB]   - setting alarm to wake at 10:38am
2022-12-06 10:36:05 [info     / 113kB]   - shutting down
2022-12-06 10:38:03 [debug    / 115kB] > performing startup
2022-12-06 10:38:03 [info     / 119kB]   - wake reason: rtc_alarm
2022-12-06 10:38:03 [debug    / 117kB]   - turn on activity led
2022-12-06 10:38:03 [debug    / 115kB] > 99 blocks free out of 212
2022-12-06 10:38:03 [debug    / 113kB] > taking new reading
2022-12-06 10:38:03 [info     / 108kB]   - seconds since last reading: 120
2022-12-06 10:38:04 [debug    /  99kB] > caching reading for upload
2022-12-06 10:38:05 [info     /  94kB] > 3 cache file(s) need uploading
2022-12-06 10:38:05 [info     / 120kB] > connecting to wifi network '<removed>'
2022-12-06 10:38:11 [info     /  97kB]   - ip address:  192.168.0.12
2022-12-06 10:38:11 [info     / 118kB] > uploading cached readings to MQTT broker: <removed>'
2022-12-06 10:38:18 [info     /  99kB]   - uploaded 2022-12-06T10_34_05Z.json
2022-12-06 10:38:24 [info     / 102kB]   - uploaded 2022-12-06T10_36_05Z.json
2022-12-06 10:40:00 [debug    /  49kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 32, in upload_reading
  File "enviro/mqttsimple.py", line 110, in connect
IndexError: bytes index out of range

2022-12-06 10:40:00 [error    /  47kB]   ! failed to upload '2022-12-06T10_38_04Z.json' to mqtt
2022-12-06 10:40:01 [error    /  45kB] ! reading upload failed
2022-12-06 10:40:01 [info     /  43kB] > going to sleep
2022-12-06 10:40:01 [debug    /  41kB]   - clearing and disabling previous alarm
2022-12-06 10:40:01 [info     /  38kB]   - setting alarm to wake at 10:42am

IMG-3218

dave-ct commented 1 year ago

@ZodiusInfuser I have now installed 2 x Urbans, 2 x INdoor and 1 x Weather with the build you did yesterday Will monitor them now and see if any further issues.

One thing I did check was the os version, so form the original 0.0.9 got: (sysname='rp2', nodename='rp2', release='1.19.1', version='9dfabcd-dirty on 2022-11-18 (GNU 9.2.1 MinSizeRel)', machine='Raspberry Pi Pico W with RP2040')

And on the buld from last night and 0.0.8 MP I had been using previosuly got: (sysname='rp2', nodename='rp2', release='1.19.1', version='9dfabcd-dirty on 2022-08-10 (GNU 9.2.1 MinSizeRel)', machine='Raspberry Pi Pico W with RP2040')

The only differnece I see is the dates on the version and not sure what dirty build means but though i would flag just incase that could indicate some potential area for differnce in core micropython

version='9dfabcd-dirty on 2022-11-18 vs version='9dfabcd-dirty on 2022-08-10

peter-mount commented 1 year ago

Since I got the urban running with #117 I've still been seeing the occasional hang on mine. It's at random times but it when I have looked at the logs it was WiFi/MQTT related. For me however, when it locks up it recovers after 10, 20 or 30 minutes - the duration can vary.

As it's mounted outside & it's pretty cold here right now I've not been out to take it off the wall to look at the logs so I've just ordered a second Urban for debugging purposes - it'll be easier to debug rather than getting the ladder out each time.

That said, one observation from this morning, after spending the night with sub-zero temperatures the urban did start locking up just before 6am and I've noticed that when it does lock up it's using power as it's causing the temperature in the Stephenson screen to rise as you can see in this plot.

Screenshot from 2022-12-08 08-48-49

To my eyes this would hint that the pico isn't going to sleep when it does hang and possibly the processor is busy in a loop causing the temperature to rise. It's only due to the current cold spell that this effect has been noticeable.

dave-ct commented 1 year ago

@peter-mount I am also still getting some mqtt errors on the urban (mainly bytes index out of range). The logs show it does not shutdown compeltey so hangs but when this happens it will not recover. Your one seem to be recovering.

peter-mount commented 1 year ago

I've had a second urban running for about 12 hours now and annoyingly it's been stable (although it's on USB power which might affect the results).

I've now moved it so it's next to the outdoor one (just the wall between them) and will leave it running for a day before I setup an identical power setup with the outdoor urban - which is still showing issues at times.

I've also setup an eviro indoor which is on battery to see how it handles the issue - and it does show it but not as often. Recovery is usually a few minutes later.

All 4 boards I have are running the same 0.0.9 firmware but with the #117 sleep fix applied

peter-mount commented 1 year ago

So far, the new Urban has been running flawlessly for a week now, but that's located indoors.

The one physically outside has been intermittent and then went offline again until I have to get the ladder out and reset it with pressing poke/reset together like #117 - I did that again this morning and it worked for about 30 mins and then started being intermittent again.

At some point today I'll swap the two to see how the new one handles the outside - in case it's just due to the unusually low temperatures we are having in the UK.

ZodiusInfuser commented 1 year ago

Curious. Did you say the two Urbans were running different firmware?

Since I've made no headway in addressing these intermittent lock-ups, I have been thinking about enabling the Pico's Watchdog timer during the boot process. This would cause the board to reset in the event of a lockup, which would unstick the majority of cases (exception being the case where the board lockup survives a reset).

The downside with this is that it would make development a real headache, as you would basically need code running constantly to reset the watchdog every 7 seconds, otherwise a reset will occur, unless I can somehow inject the reset into the micropython parser somehow

peter-mount commented 1 year ago

Both are running the same firmware v0.0.9 with the following changes

So the only real difference is the power supply. I found the duplicate supply I'm using for the outdoor one earlier this morning so I should setup the indoor one to use it so it's the same.

peter-mount commented 1 year ago

I've just switched the internal urban onto a 4.5v DC supply to rule out the supply, so other than the name they should be identical. Will monitor them over the next day or so to see how they behave.

The next option for me would be to swap the two so the new one is outside.

dave-ct commented 1 year ago

@ZodiusInfuser I have tried the wathdog timer on a diffenret PICO setup which was doing mqtt uploads of sensor readings which was to detect wifi hang issues. It worked fine but as you said can be pain to keep feeding the watchdog. I was thinking of also doing this on the enviro and feeding the watchdog timer eveytime there is log entry but the issue here is when you do uploads they can take more than 7-8 seconds (have seen it take up to 20 seconds to upload a single JSON). The issue I found was that once you start the watchdog timer you cannot turn it off, I was thinking that after boot then swicthing to multithreading (one for main and one just for the wathcdog) so that can keep the watch dog feed going but dont know how that might impact as have never tried it on micropython before.

dave-ct commented 1 year ago

With added loggin in the mqtt destination still get hangs when an upload fails, this time an OSError (after which try to disconnect incase it is still connected which also fails), it registers the upload failed in the logs but then you does not go back to sleep.

2022-12-19 01:52:04 [debug    / 115kB] > performing startup
2022-12-19 01:52:04 [info     / 122kB]   - wake reason: rtc_alarm
2022-12-19 01:52:04 [debug    / 120kB]   - turn on activity led
2022-12-19 01:52:04 [debug    / 118kB] > 99 blocks free out of 212
2022-12-19 01:52:04 [debug    / 116kB] > taking new reading
2022-12-19 01:52:04 [info     / 111kB]   - seconds since last reading: 120
2022-12-19 01:52:04 [debug    / 109kB]   - starting sensor
2022-12-19 01:52:04 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-19 01:52:09 [debug    /  87kB]   - taking pms5003i reading
2022-12-19 01:52:09 [debug    /  85kB]   - taking microphone reading
2022-12-19 01:52:10 [debug    / 116kB] > caching reading for upload
2022-12-19 01:52:10 [info     / 111kB] > 3 cache file(s) need uploading
2022-12-19 01:52:10 [info     / 109kB] > connecting to wifi network '<removed>'
2022-12-19 01:52:41 [error    /  97kB] ! failed to connect to wireless network <removed>
2022-12-19 01:52:41 [error    /  95kB]   - cannot upload readings, wifi connection failed
2022-12-19 01:52:41 [error    /  93kB] ! reading upload failed
2022-12-19 01:52:41 [info     /  91kB] > going to sleep
2022-12-19 01:52:41 [debug    /  89kB]   - clearing and disabling previous alarm
2022-12-19 01:52:42 [info     /  87kB]   - setting alarm to wake at 01:54am
2022-12-19 01:52:42 [info     /  85kB]   - shutting down
2022-12-19 01:54:04 [debug    / 115kB] > performing startup
2022-12-19 01:54:04 [info     / 122kB]   - wake reason: rtc_alarm
2022-12-19 01:54:04 [debug    / 120kB]   - turn on activity led
2022-12-19 01:54:04 [debug    / 118kB] > 99 blocks free out of 212
2022-12-19 01:54:04 [debug    / 116kB] > taking new reading
2022-12-19 01:54:04 [info     / 111kB]   - seconds since last reading: 120
2022-12-19 01:54:04 [debug    / 108kB]   - starting sensor
2022-12-19 01:54:04 [debug    / 106kB]   - wait 5 seconds for airflow
2022-12-19 01:54:10 [debug    /  87kB]   - taking pms5003i reading
2022-12-19 01:54:10 [debug    /  85kB]   - taking microphone reading
2022-12-19 01:54:10 [debug    / 116kB] > caching reading for upload
2022-12-19 01:54:10 [info     / 111kB] > 4 cache file(s) need uploading
2022-12-19 01:54:10 [info     / 108kB] > connecting to wifi network '<removed>'
2022-12-19 01:54:17 [info     /  86kB]   - ip address:  <removed>
2022-12-19 01:54:17 [info     / 116kB] > uploading cached readings to MQTT broker: <removed>
2022-12-19 01:54:23 [info     / 109kB]   - uploaded 2022-12-19T01_48_10Z.json
2022-12-19 01:54:37 [info     / 113kB]   - uploaded 2022-12-19T01_50_10Z.json
2022-12-19 01:54:48 [info     / 108kB]   - uploaded 2022-12-19T01_52_10Z.json
2022-12-19 01:55:07 [debug    /  94kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 35, in upload_reading
  File "enviro/mqttsimple.py", line 70, in connect
OSError: [Errno 103] ECONNABORTED

2022-12-19 01:55:07 [debug    /  90kB]   - an exception occurred when disconnecting mqtt client. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 47, in upload_reading
  File "enviro/mqttsimple.py", line 116, in disconnect
OSError: [Errno 103] ECONNABORTED

2022-12-19 01:55:07 [error    /  73kB]   ! failed to upload '2022-12-19T01_54_10Z.json' to mqtt
2022-12-19 01:55:08 [error    /  71kB] ! reading upload failed
2000-01-01 00:05:33 [debug    / 115kB] > performing startup
ZodiusInfuser commented 1 year ago

@dave-ct I spent some time on Friday trying out the watchdog timer, as a colleague suggested it may be possible to change it's clock so that it could run for longer. So I set up a build which would start the WDT before Micropython booted and had a play. Indeed the time can be extended, however, due to the internals of the RP2040 the clock the WDT uses is shared by regular timers, so it completely threw off call to time.sleep(). I attempted to adjust other clocks to compensate but kept getting frequent hangs.

Still 8.3 seconds could be sufficient, if the WDT could be reset in the back end. Unfortunately Micropython does not work the way I thought it did. I assumed it had a loop that ran which interpreted each Python line then executed it, making it easy to inject the WDT reset as user code runs, but there's a lot more fancy things going on making that impossible.

The only remaining idea that we've had here is to make use of the second core to reset the WDT, and have the main code call a function on that. This would give us the option to extend the WDT time, but does seem like a hacky solution, which may not even catch all the lock-up cases. Giving it some thought.

dave-ct commented 1 year ago

@ZodiusInfuser its worth noting it only happens on high frequency of readings (2 minutes and uplaod every 3), on the ones i have outdoors I have not encountered the issue (these are doing 10 minutes and upload every 3). Seems to me it is more likley a bug with micropython or the PICO iteslf.

TheKentuckian commented 1 year ago

Hi folks, It appears I may have a similar problem with my enviro grow board. I've been using it outside the last few weeks here in London. This morning I noticed it was flashing red, and brought it in to look at the logs:

2023-01-03 05:50:13 [info / 119kB] - setting alarm to wake at 06:00am 2023-01-03 05:50:13 [info / 117kB] - shutting down 2023-01-03 06:00:02 [debug / 115kB] > performing startup 2023-01-03 06:00:02 [info / 112kB] - wake reason: external_trigger 2023-01-03 06:00:02 [debug / 110kB] - turn on activity led 2023-01-03 06:00:04 [info / 99kB] > 2 cache files need uploading 2023-01-03 06:00:04 [info / 97kB] > connecting to wifi network '' 2023-01-03 06:00:06 [debug / 127kB] - connecting 2023-01-03 06:00:09 [debug / 113kB] - connecting 2023-01-03 06:00:17 [info / 85kB] - ip address: 192.168.1.177 2023-01-03 06:00:18 [info / 111kB] - uploaded 2023-01-03T05:50:04Z.json to adafruit_io 2023-01-03 06:00:19 [error / 97kB] ! failed to upload '2023-01-03T06:00:04Z.json' to adafruit_io 2023-01-03 06:00:19 [error / 94kB] ! reading upload failed 2023-01-03 06:00:19 [info / 92kB] > going to sleep 2023-01-03 06:00:19 [debug / 90kB] - clearing and disabling timer and alarm 2023-01-03 06:00:19 [info / 88kB] - setting alarm to wake at 06:10am 2023-01-03 11:26:15 [debug / 115kB] > performing startup 2023-01-03 11:26:15 [info / 125kB] - wake reason: external_trigger 2023-01-03 11:26:15 [debug / 123kB] - turn on activity led 2023-01-03 11:26:16 [info / 116kB] > 2 cache files need uploading

I've configured it to take readings every 10-minutes, and upload every 2 to adafruit. Something is causing it to hang, rather than go to sleep to retry again.

TheKentuckian commented 1 year ago

I had been running the firmware released as of mid-November, and I've now updated it to the latest 0.0.9 firmware. Will monitor it for the next day or so and see if it's improved.

ZodiusInfuser commented 1 year ago

@TheKentuckian Thanks for the log. I've not seen a board lock up at that location before.

By the way, to help diagnose your adafruit_io issue, you can try updating to the code from this branch, that gives more useful logging https://github.com/pimoroni/enviro/pull/132

TheKentuckian commented 1 year ago

Thanks for the tip on trying that branch @ZodiusInfuser I'll give it a try later.

I had a look at the changes in branch and it's quite serendipitous. I had actually diagnosed my envirogrow's problem with adafruit_io the other day, with the culprit being the future date problem, and I had resolved it with the same solution found in the branch (manually performing a NTP sync on the RTC). Strangely my board locked up in the middle of the night, and there's nothing indicative found in the logs. I'll have another look after trying #132

Julia7676 commented 1 year ago

I have the same symptoms as dave-ct described in the original post but I am not using MQTT.

I have adapted micropython-ftplib to provide a FTP destination. Like Dave mine hung between making the '> going to sleep' and the next expected entry ' - clearing and disabling previous alarm'. This occurred after my FTP destination returned with an error. My code is based on v0.0.9 enviro firmware.

Why would the processor hang whilst making log entries?

dave-ct commented 1 year ago

@Julia7676 might be worth trying this build

https://github.com/pimoroni/enviro/actions/runs/3624876714

ZodiusInfuser commented 1 year ago

Thanks for reporting @Julia7676. Definitely worth giving the Micropython .uf2 build a try from Dave's link.

I wish I had an easy answer for why these hangs occurred (or better yet, a fix). They all seem to be happening in code outside of the Enviro firmware itself, making them very difficult to diagnose. I'm coming to the conclusion that Micropython for Pico W isn't 100% stable. For instance, I had an Enviro Urban hang on me a few days into logging in a remote location. I left it, then 5 days later (actually Christmas morning), it came back to life and logged until the battery died. Absolutely no indication of what the issue was when I checked the log.

dave-ct commented 1 year ago

@ZodiusInfuser I have a thought as a potential work around, it seems it’s when it’s using WiFi and you get some kind of an error (assume the grow and @Julia7676 FTP connection don’t have the same extra logging I added to the mqtt destination). If so is there a way to shutdown the wifi chip once the error is caught before the next steps of going to sleep. I had a look around and setting the wlan to false does not do anything so maybe there is another way to turn if off?. Perhaps this is clutching at straws :-)