rbaron / b-parasite

🌱💧 An open source DIY soil moisture sensor
1.85k stars 143 forks source link

Fix (minor?) issues with the ZigBee sample #146

Closed rbaron closed 11 months ago

rbaron commented 1 year ago
  1. zigbee_configure_sleepy_behavior must be called before zigbee_enable - source docs
  2. zb_zdo_pim_set_long_poll_interval must be called after the network is joined - source docs. On top of that, the argument for the call was incorrect - it expects ms directly. Lowered the default to 10 ms, which is already double the previously working value of 5 ms (SDK default).
  3. Removed the unnecessary CONFIG_PM, CONFIG_PM_DEVICE. It seems like there's no impact in power consumption. Could it cause some of the weirdness we've seen in #126, #130?
  4. Implemented CONFIG_PRST_ZB_FACTORY_RESET_DISABLED=y to fully disable factory resetting. May aid debugging.
  5. Added comment on prj.conf on how to use multi/single channel scanning. Helps with battery life when scanning networks - see power profiling below.
  6. 🚨 Missing break statements in zboss_signal_handler
  7. Makes zboss_signal_handler return fast. According to the docs: "Signal processing should not do long operations synchronously". I removed calls to prst_led_flash (essentially a huge synchronous looping delay -- yikes) and debug_counters_increment, which were blocking. I'll try migrating the debug_counters to non-blocking separately, but got rid altogether for now
  8. Main sensor reading task is started for the first time upon the ZB_ZDO_SIGNAL_SKIP_STARTUP signal. This is what the weather_station sample does. It should have no functional difference, but there's at least some special handling for alarms started from the ZigBee task, so let's try that.
  9. Hard zb_reset() if the recurring sensor reading task cannot reschedule itself with ZB_SCHEDULE_APP_ALARM(). "This should never happen" -- famous last words

Power Profiling

Long Poll Connection Interval fixed

Now we can see the 10 ms intervals between radio activity spikes: ppk-20230627T073821-fixed-polling

Multi Channel Network Scanning

With this (default) config, all 16 channels are scanned, causing a ~10 mA average current for a few minutes: ppk-20230626T075344-multichannel-default-handler-pairing

Single Channel Network Scanning

With a single configured channel, we can even see the SDK exponential backoff during scanning, which averages at a much lower current: ppk-20230626T060359-single-channel-pairing

oleo65 commented 1 year ago

I am impressed with how many ideas and possible solutions you came up. Thanks for all your effort. I just flashed one parasite with this branch version, setting a fixed channel number.

I have also been experimenting with the CONFIG_PRST_ZB_PARENT_POLL_INTERVAL_SEC which I set to 60. I don't really see any effect on the reportings but might again save some more power in my understanding. 🤷

rbaron commented 1 year ago

@oleo65 let's see if they help! I've flashed 4 devices with this PR 🤞

I have also been experimenting with the CONFIG_PRST_ZB_PARENT_POLL_INTERVAL_SEC

This was unfortunately broken before this PR -- the call was ignored and the default value of 5 seconds was selected. From what I've recently learned, some things to consider when selecting this value:

  1. Responsiveness. We only sample the sensors every 60s (PRST_ZB_SLEEP_DURATION_SEC), so we won't update the values faster than that anyway. But for stuff like the identify callback, it may be interesting to make that happen faster, so we poll the parent more frequently. At 10s, we're looking at average current consumption of 5.5 uA, which is already pretty surprisingly low to me, and we arguably start to get diminishing returns after that IMO:

ppk-20230701T061706-long-poll-10s-operation

  1. There's a subtle note in the ZBOSS docs:

It is recommended to set the Long Poll interval to a value smaller than ((End Device Timeout period) / 3) [timeout defaults to 256 min] to avoid unexpected device aging.

So that would put our ceiling at ~85 minutes, but I don't expect us to ever get anywhere near that 😬

rbaron commented 11 months ago

Update: I've had 1 out of 5 b-parasites running this code fail after 3 weeks. But this time I verified that the device still "worked" in the sense that it responded to ZigBee read requests and to the identify callback by flashing the LED.

My theory now is that for some reason the update_sensors_cb is not running, despite rescheduling itself (now with error checking). So the device is responsive, but it's just not reading the sensors and updating the cluster values.

I will follow up with a PR to address that. I'm still merging this PR as it fixes some bugs and makes it generally more compliant.

oleo65 commented 11 months ago

I flashed two devices with this firmware revision and so far they have both been running flawlessly for over two weeks now.

I can't really say anything about battery life yet. They both recieved a fresh battery during update, oddly one device started with less than 50% charge level and has been around 30-40 % since. I suspect the coin cells to be sometimes of, well let's say varying quality. 🤷