pimoroni / enviro

MIT License
101 stars 79 forks source link

Enviro (urban/weather/indoor) - Low Disk Space after Wifi Outage - Edge case #126

Closed dave-ct closed 1 year ago

dave-ct commented 1 year ago

Version: 0.0.9 Image: https://github.com/pimoroni/enviro/actions/runs/3624876714 Summary: Uploads folder fills up and cannot recover without deleteing files. Suggested Fix: Update so that if the file system is 90% full then do not take further readings but attempt to do uploads only on each wake so that it can eventually recover instead of sleeping.

While testing anothet issue had 2 x Urbans, 2 x Indoors and 1 x Weather running on battery doing readings every 2 minutes and uploads every 3 readings. At about 08:20 the wifi router seems to have gone down so when inspecting the devices 12 hours later all flashing read. After rebooting the wifi router the devices did not automtically recover. Tried poking but reverted quickly back to red light. On inspection of device logs it shows low disk space.

2022-12-07 20:09:52 [debug    / 115kB] > performing startup
2022-12-07 20:09:52 [info     / 123kB]   - wake reason: button
2022-12-07 20:09:52 [debug    / 121kB]   - turn on activity led
2022-12-07 20:09:53 [error    / 119kB] ! low disk space
2022-12-07 20:09:53 [info     / 116kB] > going to sleep
2022-12-07 20:09:53 [debug    / 114kB]   - clearing and disabling previous alarm
2022-12-07 20:09:53 [info     / 112kB]   - setting alarm to wake at 20:10pm
2022-12-07 20:09:53 [info     / 110kB]   - shutting down
2022-12-07 20:10:03 [debug    / 115kB] > performing startup
2022-12-07 20:10:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-07 20:10:03 [debug    / 121kB]   - turn on activity led
2022-12-07 20:10:04 [error    / 119kB] ! low disk space
2022-12-07 20:10:04 [info     / 116kB] > going to sleep
2022-12-07 20:10:04 [debug    / 114kB]   - clearing and disabling previous alarm
2022-12-07 20:10:04 [info     / 112kB]   - setting alarm to wake at 20:12pm
2022-12-07 20:10:04 [info     / 110kB]   - shutting down
2022-12-07 20:10:17 [debug    / 115kB] > performing startup
2022-12-07 20:10:17 [info     / 123kB]   - wake reason: button
2022-12-07 20:10:17 [debug    / 121kB]   - turn on activity led
2022-12-07 20:10:18 [error    / 119kB] ! low disk space
2022-12-07 20:10:18 [info     / 116kB] > going to sleep
2022-12-07 20:10:18 [debug    / 114kB]   - clearing and disabling previous alarm
2022-12-07 20:10:18 [info     / 112kB]   - setting alarm to wake at 20:12pm
2022-12-07 20:10:18 [info     / 110kB]   - shutting down

Uploads folder full of about 180 json files waiting to upload.

Simple fix was to delete the uplaods folder.

While this is an edge case due to frquency of readings. Possible solution is instead of enviro.halt at 10% then can attepemt to upload the readings but dont take any new ones, can log a alternate mesasge as well. This could be useful in particular for the Urban or Weather as if they are attached to a roof or hard to reach area means you have to bring them inside to fix which is a pain.

I think I could make pull request for this, just need to alter main.py but wanted to check if alternate approach should be done?

ZodiusInfuser commented 1 year ago

Thanks for raising this edge case. Yes, the board should keep attempting to upload files in this instance.

Here's how I'd probably do it. I suspect there's some scenario I'm not thinking of with this though, not to mention the duplicated code.

import enviro
import os

try:
  # initialise enviro
  enviro.startup()

  # if the clock isn't set...
  if not enviro.is_clock_set():
    enviro.logging.info("> clock not set, synchronise from ntp server")
    if not enviro.sync_clock_from_ntp():
      # failed to talk to ntp server go back to sleep for another cycle
      enviro.halt("! failed to synchronise clock")  

  # check disk space...
  if enviro.low_disk_space():
    # less than 10% of diskspace left, this probably means cached results
    # are not getting uploaded so warn the user and halt with an error

    # is an upload destination set?
    if enviro.config.destination:
      enviro.logging.error("! low disk space. Attempting to upload file(s)")

      # if we have enough cached uploads...
      enviro.logging.info(f"> {enviro.cached_upload_count()} cache file(s) need uploading")
      if not enviro.upload_readings():
        enviro.halt("! reading upload failed")
    else:
      # no destination so go to sleep
      enviro.halt("! low disk space")

  # TODO this seems to be useful to keep around?
  filesystem_stats = os.statvfs(".")
  enviro.logging.debug(f"> {filesystem_stats[3]} blocks free out of {filesystem_stats[2]}")

  # TODO should the board auto take a reading when the timer has been set, or wait for the time?
  # take a reading from the onboard sensors
  enviro.logging.debug(f"> taking new reading")
  reading = enviro.get_sensor_readings()

  # here you can customise the sensor readings by adding extra information
  # or removing readings that you don't want, for example:
  # 
  #   del readings["temperature"]        # remove the temperature reading
  #
  #   readings["custom"] = my_reading()  # add my custom reading value

  # is an upload destination set?
  if enviro.config.destination:
    # if so cache this reading for upload later
    enviro.logging.debug(f"> caching reading for upload")
    enviro.cache_upload(reading)

    # if we have enough cached uploads...
    if enviro.is_upload_needed():
      enviro.logging.info(f"> {enviro.cached_upload_count()} cache file(s) need uploading")
      if not enviro.upload_readings():
        enviro.halt("! reading upload failed")
    else:
      enviro.logging.info(f"> {enviro.cached_upload_count()} cache file(s) not being uploaded. Waiting until there are {enviro.config.upload_frequency} file(s)")
  else:
    # otherwise save reading to local csv file (look in "/readings")
    enviro.logging.debug(f"> saving reading locally")
    enviro.save_reading(reading)

  # go to sleep until our next scheduled reading
  enviro.sleep()

# handle any unexpected exception that has occurred
except Exception as exc:
  enviro.exception(exc)

If you could give it a test that would be great. And yes, if you find that it does work for you, or need to make modifications, please raise a PR.

dave-ct commented 1 year ago

@ZodiusInfuser

Started testing by disabling wifi router and letting 2 x Urbans, 2 x Indoors and 1 x Weather build up files and fill up (readings evey 2 minutes with uplaods every 3 minutes).

Sample log before turning wfi router back on, see entries at 17:12 which show not taking readings and attemtping to upload (162 files) but will fail as wifi still off. :

2022-12-11 17:10:04 [debug    / 115kB] > performing startup
2022-12-11 17:10:04 [info     / 122kB]   - wake reason: rtc_alarm
2022-12-11 17:10:04 [debug    / 120kB]   - turn on activity led
2022-12-11 17:10:04 [debug    / 118kB] > 23 blocks free out of 212
2022-12-11 17:10:05 [debug    / 115kB] > taking new reading
2022-12-11 17:10:05 [info     / 110kB]   - seconds since last reading: 120
2022-12-11 17:10:05 [debug    / 107kB]   - starting sensor
2022-12-11 17:10:05 [debug    / 105kB]   - wait 5 seconds for airflow
2022-12-11 17:10:10 [debug    /  86kB]   - taking pms5003i reading
2022-12-11 17:10:10 [debug    /  84kB]   - taking microphone reading
2022-12-11 17:10:11 [debug    / 116kB] > caching reading for upload
2022-12-11 17:10:11 [info     /  83kB] > 162 cache file(s) need uploading
2022-12-11 17:10:11 [info     /  80kB] > connecting to wifi network '<Removed>'
2022-12-11 17:10:13 [error    / 118kB] ! failed to connect to wireless network <Removed>
2022-12-11 17:10:14 [error    / 115kB]   - cannot upload readings, wifi connection failed
2022-12-11 17:10:14 [error    / 113kB] ! reading upload failed
2022-12-11 17:10:14 [info     / 111kB] > going to sleep
2022-12-11 17:10:14 [debug    / 109kB]   - clearing and disabling previous alarm
2022-12-11 17:10:14 [info     / 107kB]   - setting alarm to wake at 17:12pm
2022-12-11 17:10:14 [info     / 105kB]   - shutting down
2022-12-11 17:12:04 [debug    / 115kB] > performing startup
2022-12-11 17:12:04 [info     / 122kB]   - wake reason: rtc_alarm
2022-12-11 17:12:04 [debug    / 120kB]   - turn on activity led
2022-12-11 17:12:04 [error    / 118kB] ! low disk space. Attempting to upload file(s)
2022-12-11 17:12:05 [info     / 101kB] > 162 cache file(s) need uploading
2022-12-11 17:12:05 [info     /  98kB] > connecting to wifi network '<Removed>'
2022-12-11 17:12:07 [error    /  90kB] ! failed to connect to wireless network <Removed>
2022-12-11 17:12:07 [error    /  87kB]   - cannot upload readings, wifi connection failed
2022-12-11 17:12:07 [error    /  85kB] ! reading upload failed
2022-12-11 17:12:07 [info     / 124kB] > going to sleep
2022-12-11 17:12:07 [debug    / 122kB]   - clearing and disabling previous alarm
2022-12-11 17:12:07 [info     / 120kB]   - setting alarm to wake at 17:14pm
2022-12-11 17:12:07 [info     / 118kB]   - shutting down

All three types of devcies showing same Low disk space warning then attempeting to upload but failed as wifi off.

Turned Wifi back on and all 5 devices successfully uplaod their 160 odd flies and returned ot normal operation.

It did take about 20 minutes but as using SSL for MQTT it can take a lot of time

Sample log on success:

2022-12-11 17:58:33 [info     / 102kB]   - uploaded 2022-12-11T14_10_05Z.json
2022-12-11 17:58:38 [info     / 110kB]   - uploaded 2022-12-11T14_12_05Z.json
2022-12-11 17:58:42 [info     / 111kB]   - uploaded 2022-12-11T14_14_05Z.json
2022-12-11 17:58:49 [info     / 110kB]   - uploaded 2022-12-11T14_16_05Z.json
2022-12-11 17:59:01 [info     / 104kB]   - uploaded 2022-12-11T14_18_05Z.json
2022-12-11 17:59:15 [info     /  90kB]   - uploaded 2022-12-11T14_20_05Z.json
2022-12-11 17:59:22 [info     / 110kB]   - uploaded 2022-12-11T14_22_05Z.json
2022-12-11 17:59:40 [info     / 104kB]   - uploaded 2022-12-11T14_24_05Z.json
2022-12-11 17:59:45 [info     / 110kB]   - uploaded 2022-12-11T14_26_05Z.json
2022-12-11 17:59:58 [info     / 104kB]   - uploaded 2022-12-11T14_28_05Z.json
2022-12-11 18:00:03 [info     / 109kB]   - uploaded 2022-12-11T14_30_05Z.json
2022-12-11 18:00:07 [info     / 109kB]   - uploaded 2022-12-11T14_32_05Z.json
2022-12-11 18:00:11 [info     / 112kB]   - uploaded 2022-12-11T14_34_05Z.json
2022-12-11 18:00:19 [info     / 110kB]   - uploaded 2022-12-11T14_36_05Z.json
2022-12-11 18:00:23 [info     / 109kB]   - uploaded 2022-12-11T14_38_05Z.json
2022-12-11 18:00:30 [info     / 110kB]   - uploaded 2022-12-11T14_40_05Z.json
2022-12-11 18:00:36 [info     / 102kB]   - uploaded 2022-12-11T14_42_05Z.json
2022-12-11 18:00:44 [info     /  98kB]   - uploaded 2022-12-11T14_44_05Z.json
2022-12-11 18:00:49 [info     / 109kB]   - uploaded 2022-12-11T14_46_05Z.json
2022-12-11 18:00:54 [info     / 108kB]   - uploaded 2022-12-11T14_48_05Z.json
2022-12-11 18:01:01 [info     / 109kB]   - uploaded 2022-12-11T14_50_05Z.json
2022-12-11 18:01:15 [info     /  77kB]   - uploaded 2022-12-11T14_52_05Z.json
2022-12-11 18:01:23 [info     / 101kB]   - uploaded 2022-12-11T14_54_05Z.json
2022-12-11 18:01:30 [info     / 101kB]   - uploaded 2022-12-11T14_56_05Z.json
2022-12-11 18:01:40 [info     / 101kB]   - uploaded 2022-12-11T14_58_05Z.json
2022-12-11 18:01:45 [info     / 109kB]   - uploaded 2022-12-11T15_00_05Z.json
2022-12-11 18:01:54 [info     / 101kB]   - uploaded 2022-12-11T15_02_05Z.json
2022-12-11 18:02:07 [info     /  95kB]   - uploaded 2022-12-11T15_04_05Z.json
2022-12-11 18:02:15 [info     / 108kB]   - uploaded 2022-12-11T15_06_05Z.json
2022-12-11 18:02:20 [info     / 109kB]   - uploaded 2022-12-11T15_08_05Z.json
2022-12-11 18:02:24 [info     / 113kB]   - uploaded 2022-12-11T15_10_05Z.json
2022-12-11 18:02:28 [info     / 110kB]   - uploaded 2022-12-11T15_12_05Z.json
2022-12-11 18:02:33 [info     / 113kB]   - uploaded 2022-12-11T15_14_05Z.json
2022-12-11 18:02:43 [info     / 108kB]   - uploaded 2022-12-11T15_16_05Z.json
2022-12-11 18:02:48 [info     / 111kB]   - uploaded 2022-12-11T15_18_05Z.json
2022-12-11 18:02:53 [info     / 109kB]   - uploaded 2022-12-11T15_20_05Z.json
2022-12-11 18:03:00 [info     / 111kB]   - uploaded 2022-12-11T15_22_05Z.json
2022-12-11 18:03:08 [info     / 109kB]   - uploaded 2022-12-11T15_24_05Z.json
2022-12-11 18:03:17 [info     / 102kB]   - uploaded 2022-12-11T15_26_05Z.json
2022-12-11 18:03:31 [info     /  99kB]   - uploaded 2022-12-11T15_28_06Z.json
2022-12-11 18:03:37 [info     / 103kB]   - uploaded 2022-12-11T15_30_05Z.json
2022-12-11 18:03:42 [info     / 109kB]   - uploaded 2022-12-11T15_32_05Z.json
2022-12-11 18:03:49 [info     / 110kB]   - uploaded 2022-12-11T15_34_05Z.json
2022-12-11 18:03:54 [info     / 110kB]   - uploaded 2022-12-11T15_36_05Z.json
2022-12-11 18:04:05 [info     /  99kB]   - uploaded 2022-12-11T15_38_05Z.json
2022-12-11 18:04:09 [info     / 110kB]   - uploaded 2022-12-11T15_40_05Z.json
2022-12-11 18:04:14 [info     / 108kB]   - uploaded 2022-12-11T15_42_05Z.json
2022-12-11 18:04:23 [info     / 103kB]   - uploaded 2022-12-11T15_44_05Z.json
2022-12-11 18:04:28 [info     / 109kB]   - uploaded 2022-12-11T15_46_05Z.json
2022-12-11 18:04:33 [info     / 109kB]   - uploaded 2022-12-11T15_48_05Z.json
2022-12-11 18:04:37 [info     / 110kB]   - uploaded 2022-12-11T15_50_05Z.json
2022-12-11 18:04:42 [info     / 109kB]   - uploaded 2022-12-11T15_52_05Z.json
2022-12-11 18:04:53 [info     / 107kB]   - uploaded 2022-12-11T15_54_05Z.json
2022-12-11 18:04:57 [info     / 109kB]   - uploaded 2022-12-11T15_56_05Z.json
2022-12-11 18:05:07 [info     / 104kB]   - uploaded 2022-12-11T15_58_05Z.json
2022-12-11 18:05:11 [info     / 110kB]   - uploaded 2022-12-11T16_00_05Z.json
2022-12-11 18:05:16 [info     / 109kB]   - uploaded 2022-12-11T16_02_05Z.json
2022-12-11 18:05:21 [info     / 110kB]   - uploaded 2022-12-11T16_04_06Z.json
2022-12-11 18:05:25 [info     / 110kB]   - uploaded 2022-12-11T16_06_06Z.json
2022-12-11 18:05:33 [info     / 108kB]   - uploaded 2022-12-11T16_08_05Z.json
2022-12-11 18:05:43 [info     / 101kB]   - uploaded 2022-12-11T16_10_05Z.json
2022-12-11 18:05:51 [info     / 104kB]   - uploaded 2022-12-11T16_12_06Z.json
2022-12-11 18:05:56 [info     / 110kB]   - uploaded 2022-12-11T16_14_05Z.json
2022-12-11 18:06:01 [info     / 108kB]   - uploaded 2022-12-11T16_16_05Z.json
2022-12-11 18:06:05 [info     / 110kB]   - uploaded 2022-12-11T16_18_05Z.json
2022-12-11 18:06:10 [info     / 109kB]   - uploaded 2022-12-11T16_20_05Z.json
2022-12-11 18:06:16 [info     / 102kB]   - uploaded 2022-12-11T16_22_05Z.json
2022-12-11 18:06:38 [info     /  94kB]   - uploaded 2022-12-11T16_24_05Z.json
2022-12-11 18:06:43 [info     / 109kB]   - uploaded 2022-12-11T16_26_06Z.json
2022-12-11 18:06:47 [info     / 109kB]   - uploaded 2022-12-11T16_28_05Z.json
2022-12-11 18:07:14 [info     /  71kB]   - uploaded 2022-12-11T16_30_05Z.json
2022-12-11 18:07:21 [info     / 102kB]   - uploaded 2022-12-11T16_32_05Z.json
2022-12-11 18:07:29 [info     / 119kB]   - uploaded 2022-12-11T16_34_06Z.json
2022-12-11 18:07:36 [info     / 110kB]   - uploaded 2022-12-11T16_36_05Z.json
2022-12-11 18:07:40 [info     / 109kB]   - uploaded 2022-12-11T16_38_05Z.json
2022-12-11 18:07:51 [info     /  98kB]   - uploaded 2022-12-11T16_40_05Z.json
2022-12-11 18:07:58 [info     / 101kB]   - uploaded 2022-12-11T16_42_06Z.json
2022-12-11 18:08:10 [info     /  92kB]   - uploaded 2022-12-11T16_44_05Z.json
2022-12-11 18:08:15 [info     / 110kB]   - uploaded 2022-12-11T16_46_06Z.json
2022-12-11 18:08:19 [info     / 110kB]   - uploaded 2022-12-11T16_48_06Z.json
2022-12-11 18:08:31 [info     /  93kB]   - uploaded 2022-12-11T16_50_05Z.json
2022-12-11 18:08:39 [info     / 110kB]   - uploaded 2022-12-11T16_52_06Z.json
2022-12-11 18:09:24 [info     /  99kB]   - uploaded 2022-12-11T16_54_06Z.json
2022-12-11 18:09:31 [info     / 102kB]   - uploaded 2022-12-11T16_56_06Z.json
2022-12-11 18:09:36 [info     / 109kB]   - uploaded 2022-12-11T16_58_05Z.json
2022-12-11 18:09:41 [info     / 110kB]   - uploaded 2022-12-11T17_00_05Z.json
2022-12-11 18:09:48 [info     / 110kB]   - uploaded 2022-12-11T17_02_06Z.json
2022-12-11 18:09:53 [info     / 109kB]   - uploaded 2022-12-11T17_04_06Z.json
2022-12-11 18:10:00 [info     / 104kB]   - uploaded 2022-12-11T17_06_05Z.json
2022-12-11 18:10:07 [info     / 110kB]   - uploaded 2022-12-11T17_10_06Z.json
2022-12-11 18:10:11 [info     / 109kB]   - uploaded 2022-12-11T17_20_06Z.json
2022-12-11 18:10:16 [info     / 110kB]   - uploaded 2022-12-11T17_28_06Z.json
2022-12-11 18:10:24 [info     / 111kB]   - uploaded 2022-12-11T17_36_06Z.json
2022-12-11 18:10:24 [debug    / 103kB] > 100 blocks free out of 212
2022-12-11 18:10:24 [debug    / 101kB] > taking new reading
2022-12-11 18:10:25 [info     /  96kB]   - seconds since last reading: 2060
2022-12-11 18:10:25 [debug    /  91kB] > caching reading for upload
2022-12-11 18:10:25 [info     /  87kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-11 18:10:25 [info     /  84kB] > going to sleep
2022-12-11 18:10:25 [debug    /  82kB]   - clearing and disabling previous alarm
2022-12-11 18:10:25 [info     /  80kB]   - setting alarm to wake at 18:12pm
2022-12-11 18:10:25 [info     /  78kB]   - shutting down

Data starting to upload: Screenshot 2022-12-11 at 17 56 10

Data uploaded for all 5 devices: Screenshot 2022-12-11 at 18 15 12

Pull request to cover change under https://github.com/pimoroni/enviro/pull/125/commits/c902806e2b1bd104173f4106706cae692dbdc794

ZodiusInfuser commented 1 year ago

Glad to see that the logging change worked and let the board recover! I'll look at getting the fix merged in.

dave-ct commented 1 year ago

@ZodiusInfuser have done a seperate pull request here #128