Closed MJDSys closed 1 year ago
That's fantastic, thanks for digging this up and following up with the fix.
For added context, here's Nordics known issues page with the suggested workaround for KRKNWK-12017. 2.3.0 is still affected.
It would explain some of the behavior we're seeing here, so I'm hopeful. I will flash a couple of b-parasites with this branch. I propose we let it roast for a few days and discuss the results here.
Thanks @MJDSys for digging this up. I will flash this firmware to some of my parasites today and will observe. 🎉
I am struggling with very fast drained batteries at the moment and maybe this will fix this as well. I suspect a mix of some software wakelock due to a firmware bug and possibly a bad batch of coin cells. Quite hard to pin down...
That's fantastic, thanks for digging this up and following up with the fix.
For added context, here's Nordics known issues page with the suggested workaround for KRKNWK-12017. 2.3.0 is still affected.
It would explain some of the behavior we're seeing here, so I'm hopeful. I will flash a couple of b-parasites with this branch. I propose we let it roast for a few days and discuss the results here.
Sounds good, I'll keep an eye on mine too.
Thanks @MJDSys for digging this up. I will flash this firmware to some of my parasites today and will observe. :tada:
I am struggling with very fast drained batteries at the moment and maybe this will fix this as well. I suspect a mix of some software wakelock due to a firmware bug and possibly a bad batch of coin cells. Quite hard to pin down...
I also experience battery life issues with my b-parasites. I ended up buying the Nordic power monitor, and I think I have a couple ideas. I've ordered a new batch of the v2 b-parasites (the plants must grow!) that I'll be using to gather some measurements and post about it in a new issue, but I don't think this will solve it.
Following up on https://github.com/rbaron/b-parasite/pull/126#issuecomment-1541434345, the firmware has been running for a couple of weeks, and it's still connected and working nominally as far as I can tell.
It's hard to say whether we exercised that fix as is, but there's an interesting blip on the collected data points:
The temporary drop of may be unrelated, but either way it picked back up again with no intervention.
@MJDSys , @oleo65, have you had a similar positive experience with this firmware?
I am experiencing mixed results with the firmware. It is fairly stable, but one sensor was dropping the connection multiple times usually after 2 or 3 days and needed to be power cycled for reconnecting.
This might also be related to the power drain issue but I don't have a real idea how to approach this.
@oleo65 what setup do you have? I'm running HA + SkyConnect + ZHA. Are all these 3 boards on your chart running this PR's firmware?
My Setup is HA + ConBee 2 + ZHA. So except for the Zigbee Stick the same setup.
The three sensor are all running the discussed firmware variant. I have in total around 10 sensors deployed with different firmware revisions.
Some are running a variant I am testing which will manually reinit the Join Procedure if the connection is dropped and not reconnected within a defined period of time. I wanted to gain more insights before discussing it here but so far it seems to be promising. Background is that Zephyr only tries to reconnect to the network for a fixed and hardcoded amount of time. (somewhat around 15 Min.) If no connection could be established than you either need to restart the join procedure by software or power cycle the sensor.
My setup is HA + ConBee2 + Z2M with 4 currently deployed sensors, so a little different again.
So far I've found this to be relatively stable. I had one sensor drop off the network and refuse to reconnect without a power cycle, but so far the other sensors generally stay available (and that one was at least blinking it's led, so it was more noticeable). Before this change I had sensors constantly disconnecting and staying that way. I initially blamed it on the power draw, with the batteries being emptied.
@oleo65 Your comment about Zephyr matches my experience with that one sensor. Is your firmware variant working on top of this change, or in parallel?
I deployed the "Zephyr auto reconnect" firmware about 7 weeks ago on different sensors. The next step would then be to combine both approaches into one firmware and try that. 😊
In addition I disabled most of the LED blinking because I suspected this to be a additional source of power drain if the sensor is in a faulty state but not discovered for say on or two days. This happened to me multiple times. Some sensors are deployed below foilage and not easily visible. I was also thinking about a HA automation to create a push notification if a sensor seems to be offline but I did not create it yet.
I will clean the auto reconnect code up in the next days and push it to a different branch for discussion. I appreciate all the discussion here and hope we can improve the reliability. My future plans are to automize my irrigation system more be using the soil moisture values as an additional input but need them to be more reliable for this. 😊
Awesome, thanks a lot @oleo65 and @MJDSys.
While this PR may not fix all instabilities, I have been using it successfully for 3 weeks, and it is also what Nordic recommends. I'm going to merge it and kindly ask @oleo65 to rebase #130 so we can easily test both improvements together.
Thanks!
Nordic has published an errata for the nRF Connect SDK for versions
They have a suggested workaround to implement in the SDK, which has been adapted for the custom signal handler used here.
This is an effort to solve issues where my parasites would occasionally drop off my network and require a reboot. After 24Hrs, I've not yet had a device disappear but it has taken >weeks before a device would fail. Unforunately it's hard to debug the board as the chips are in a low power state when this occurs.
CC: @oleo65 I saw you were having a similar problem in https://github.com/rbaron/b-parasite/issues/113#issuecomment-1484903062, could you also try this branch?
@rbaron I'm not sure this actually fixes my problem, so I understand if you'd prefer to wait a couple weeks before merging. If you have any feedback I'm happy to incorporate that now.