kevinvincent / ha-wyzesense

A Home Assistant Component to interface with the WYZE Sense hub and sensor system
369 stars 101 forks source link

**Possible Solution** Sensors completely stop responding after a random period of time -up to date firmware- #121

Closed RoldyBuffalo closed 4 years ago

RoldyBuffalo commented 4 years ago

So, As several others have stated, and I personally have seen happen on my end my sensors will stop working/stop responding entirely after a random period of time. This happens even after the supposed fix of updating the firmware on the bridge. I am at a loss for solutions. I've tried reinstalling the component, rebooting, restarting, all of the above. No luck. Now, currently the sensors / bridge do not seem to be responding at all. Up until 2 months ago this component worked like a charm, so much so I invested in 20+ sensors to distribute around. Now, none of them seem to be responding.

Ideas? I'm not getting any errors or anything is there somewhere else I should look?

I have gotten the sensors to start working again, in some cases with combinations of of rebooting and reinstalling the component. The problem is, if I am not home to do this random, and sometimes unsuccessful series of troubleshooting, the automations I have set up for the sensors will not work, this causes some serious distress in my home, as we're as ll locked down with nothing but time on our hands. speaking of time on my hands.....

How can I help fix this? Raspberry pi 4b 4Gb running on a 64Gb endurance samsung card, which again really isn't the issue as I've seen folk of all types of installs with the issues. Is it possible to troubleshoot this further on my end, if and when this does fail again?

RoldyBuffalo commented 4 years ago

I now have some logs that have popped up, also the sensors have stopped working again, this time seemingly faster then usual.


Logger: custom_components.wyzesense.wyzesense_custom
Source: custom_components/wyzesense/wyzesense_custom.py:364
First occurred: 10:04:09 AM (12 occurrences)
Last logged: 2:25:57 PM

Mismatched checksum, remote=07C9, local=0766
Invalid packet: b'55aa531d190000017155aee248a2373739424141313602136000010b0b534907a9'
Mismatched checksum, remote=07A9, local=07C3
Invalid packet: b'55aa1d1d1900000171561b2ccba23737394230463142021a5f0001012f5f430724'
Mismatched checksum, remote=0724, local=06EE

and


Logger: custom_components.wyzesense.wyzesense_custom
Source: custom_components/wyzesense/wyzesense_custom.py:373
First occurred: 6:08:44 PM (1 occurrences)
Last logged: 6:08:44 PM

[Errno 110] Operation timed out
RoldyBuffalo commented 4 years ago

I think I may have figured it out? Perhaps, this is some valuable information. I had a sensor that was showing up as -110rssi, this meant that sensor was dead/battery was dead/out of range. So once I removed that faulty sensor the entire stack came to life like that particular entity was holding up pretty much all of home assistant.

Once I removed the sensor (with wyze_sense.remove, or whatever it is) that was faulty (checking rssi in entity list) everything returned to normal, after a full host reboot for good measure of course.

davebeckster commented 4 years ago

Probably on to something! I also have one troublesome sensor ( difficult location) that may be dropping out as RF conditions vary. Surely this potential condition has been anticipated by designers but there are always gotchas lurking... Will focus more during next go around.1

robgazy commented 4 years ago

Did this approach seem to make any difference for you?

I still have the problem even with all 13 of my motion sensors and 2 door sensors right here at my desk waiting for deployment, all reporting rssi (-42) - (-70) and battery levels over 90. They all seem to work fine when they work, but otherwise have the random catastrophic communication failure problem.

Usually the way I first notice this is when the motion sensors continue reporting "detected" for too long and never seem to clear. I've been doing nightly restarts, but this thing is so random it can happen any time.

davebeckster commented 4 years ago

I also continue to have the problem of seemingly random "freezes" of Wyze Sense. For many months it was impressively reliable with about 25 sensors but recently has deteriorated to about a daily freeze. Most times a HA restart will restore operation but sometimes a power down restart is required. The software is evolving as is my configuration and my diagnostic skills are limited. This seem to be a problem experienced by many. Hope someone can shed some light on a fix.

On Tue, Apr 21, 2020, 2:56 PM robgazy notifications@github.com wrote:

Did this approach seem to make any difference for you?

I still have the problem even with all 13 of my motion sensors and 2 door sensors right here at my desk waiting for deployment, all reporting rssi (-42) - (-70) and battery levels over 90. They all seem to work fine when they work, but otherwise have the random catastrophic communication failure problem.

Usually the way I first notice this is when the motion sensors continue reporting "detected" for too long and never seem to clear. I've been doing nightly restarts, but this thing is so random it can happen any time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-617434677, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI43RT6K4W4PSIQIJW3RNYI7VANCNFSM4MDEEWZQ .

RoldyBuffalo commented 4 years ago

I'm on day 7, just rebooted because of a supervisor and core update, but haven't had ANY issues with the sensors this week, or last week. I really, truly think my issues where stemming from having the bridge on an old firmware, and then once the firmware on the bridge was updated, finding faulty sensors and either repairing them, or replacing them. I only had one faulty sensor, and I wasn't able to find it in the traditional means, it only showed up in my entities list. I have also seen people have success with removing the entire integration, all wyze db files, and reinstalling. After reinstalling you will likely have to repair your sensors. The firmware update should resolve alot of these issues, but the issue with having a sensor that is faulty remains in this build.

davebeckster commented 4 years ago

Thanks for the update. I'll check for bridge updates and see if I can find a sensor misbehaving. One is weak and sometimes fails to work but no correlation noticed to the "freezing" issue. Appreciate the help.

On Wed, Apr 22, 2020, 4:11 PM VarenDerpsAround notifications@github.com wrote:

I'm on day 7, just rebooted because of a supervisor and core update, but haven't had ANY issues with the sensors this week, or last week. I really, truly think my issues where stemming from having the bridge on an old firmware, and then once the firmware on the bridge was updated, finding faulty sensors and either repairing them, or replacing them. I only had one faulty sensor, and I wasn't able to find it in the traditional means, it only showed up in my entities list. I have also seen people have success with removing the entire integration, all wyze db files, and reinstalling. After reinstalling you will likely have to repair your sensors. The firmware update should resolve alot of these issues, but the issue with having a sensor that is faulty remains in this build.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-618085745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI36RHQ2C6KJNXDYKUTRN52QJANCNFSM4MDEEWZQ .

RoldyBuffalo commented 4 years ago

Thanks for the update. I'll check for bridge updates and see if I can find a sensor misbehaving. One is weak and sometimes fails to work but no correlation noticed to the "freezing" issue. Appreciate the help. On Wed, Apr 22, 2020, 4:11 PM VarenDerpsAround @.***> wrote: I'm on day 7, just rebooted because of a supervisor and core update, but haven't had ANY issues with the sensors this week, or last week. I really, truly think my issues where stemming from having the bridge on an old firmware, and then once the firmware on the bridge was updated, finding faulty sensors and either repairing them, or replacing them. I only had one faulty sensor, and I wasn't able to find it in the traditional means, it only showed up in my entities list. I have also seen people have success with removing the entire integration, all wyze db files, and reinstalling. After reinstalling you will likely have to repair your sensors. The firmware update should resolve alot of these issues, but the issue with having a sensor that is faulty remains in this build. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#121 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI36RHQ2C6KJNXDYKUTRN52QJANCNFSM4MDEEWZQ .

I 100% bet that "weak sensor" if it's dropping in and out of the network, that is your issue. The sensors, at least right now seem to hold a pretty flaky connection to ALL sensors, if ANY of the sensors fail to respond in a timely manner. Try removing the weak sensor, and test.

davebeckster commented 4 years ago

Updated wyze bridge/hub to latest version. Stopped working except for one sensor? Seems I need to reinitialize each sensor. No answer yet as to whether this will impact my freeze issue. Tomorrow's project...

On Tue, Apr 21, 2020, 14:56 robgazy notifications@github.com wrote:

Did this approach seem to make any difference for you?

I still have the problem even with all 13 of my motion sensors and 2 door sensors right here at my desk waiting for deployment, all reporting rssi (-42) - (-70) and battery levels over 90. They all seem to work fine when they work, but otherwise have the random catastrophic communication failure problem.

Usually the way I first notice this is when the motion sensors continue reporting "detected" for too long and never seem to clear. I've been doing nightly restarts, but this thing is so random it can happen any time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-617434677, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI43RT6K4W4PSIQIJW3RNYI7VANCNFSM4MDEEWZQ .

tthk commented 4 years ago

My Wyze sensors stop working every day now. New firmware. Here is the last log entry:

2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171b0399091a2373738323839353401185a00010003b04607c5'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171b0399091a2373738323839353401185a00010003b046'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319)
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '77828954', 'state': 0, 'device_class': 'door', 'timestamp': '2020-04-25T00:24:54.801000', 'rs si': -70, 'battery_level': 90}
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b0399091a2373738323839353401185a00010003b04607c5'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171b0399091a2373738323839353401185a00010003b04607c5'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171b0399091a2373738323839353401185a00010003b046'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319)
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '77828954', 'state': 0, 'device_class': 'door', 'timestamp': '2020-04-25T00:24:54.801000', 'rs si': -70, 'battery_level': 90}
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171b039911b0ea23737383238393534010103b106bf55aa531d1900000171b039911fa23737383238353 53401185b00010103430754'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171b039911b0ea23737383238393534010103b106bf'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171b039911b0ea23737383238393534010103b1'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-25 00:25:11 INFO (Thread-3) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-25T00:24:54.939000, data=b'a23737383238393534010103b1'
2020-04-25 00:25:11 DEBUG (Thread-3) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b039911fa2373738323835353401185b00010103430754'

Restarting Home Assistant worked to get things working again, this time.

davebeckster commented 4 years ago

Updating the hub firmware and eliminating a known "weak" ( RF signal) appears promising. Too early to celebrate, no issues yesterday. Unknown which or both helped. Updating firmware requires reinitializing each sensor.

On Tue, Apr 21, 2020, 14:56 robgazy notifications@github.com wrote:

Did this approach seem to make any difference for you?

I still have the problem even with all 13 of my motion sensors and 2 door sensors right here at my desk waiting for deployment, all reporting rssi (-42) - (-70) and battery levels over 90. They all seem to work fine when they work, but otherwise have the random catastrophic communication failure problem.

Usually the way I first notice this is when the motion sensors continue reporting "detected" for too long and never seem to clear. I've been doing nightly restarts, but this thing is so random it can happen any time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-617434677, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI43RT6K4W4PSIQIJW3RNYI7VANCNFSM4MDEEWZQ .

RonSpawnson commented 4 years ago

Nice thought, thanks for sharing. Though I will say, if this is the root cause, I don't think the "solution" is to instruct users to remove the 'weak sensors' from their system, but to make sure the add-on does not crash if weak sensors exist.

RoldyBuffalo commented 4 years ago

No, the nice thought would be have a repo owner who addresses issues. The solution, as I see it, is to remove faulty sensors and update your bridge. I am on day 15 without a full stack reboot, no dropped sensors, no stuck sensors. 🤷

RonSpawnson commented 4 years ago

Respectfully agree to disagree with you here. Weak sensors is one thing which causes the addon to crash prematurely. The solution surely isn't to tell everyone 'make sure you don't have weak sensors' but rather to ensure that the addon does not crash prematurely in this and any other exceptional situation.

I'm tracking a potential fix. Similar to what we are seeing in #114, we believe the worker thread is being killed prematurely. Perhaps that is also happening due to your "weak sensor" issue. You'll notice in debug-level logs there's a message that starts "Trying to parse:", but that's the last message. There should have been another message starting "Received", but there is none. tthk saw this above as well.

I believe that the worker thread / watch dog is being killed due to an unhandled exception, or improperly handled exception. I'm attempting a fix as documented in #114, and will report back with findings.

robgazy commented 4 years ago

@RonSpawnson Agreed. The "weak sensor" workaround is not helping me at all.

davebeckster commented 4 years ago

Can now confirm, updating firmware and eliminating weak sensors does NOT resolve this issue. Do not have the talent to chase it into code, appreciate the efforts of those who do.

On Sat, Apr 25, 2020, 18:09 RonSpawnson notifications@github.com wrote:

Respectfully agree to disagree with you here. I'm tracking a potential fix. Similar to what we are seeing in #114 https://github.com/kevinvincent/ha-wyzesense/issues/114, we believe the worker thread is being killed prematurely. Perhaps that is also happening due to your "weak sensor" issue. You'll notice there's a message in debug logs that starts "Trying to parse:", but that's the last message. There should have been another message starting "Received". tthk saw this above as well.

I believe that the worker thread / watch dog is being killed due to an unhandled exception, or improperly handled exception. I'm attempting a fix as documented in #114 https://github.com/kevinvincent/ha-wyzesense/issues/114, and will report back with findings.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-619462873, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI4AQUMW7CAI6YDDKI3ROOCVNANCNFSM4MDEEWZQ .

RoldyBuffalo commented 4 years ago

You say it doesn't solve the issue, when in my circumstances, it did, it does, and it still has... Create your own issue thread, please.

RonSpawnson commented 4 years ago

Sure - we'll use a different ticket to track since you are satisfied with your mitigation. I hope your mitigation turns out to be permanent and the issue does not re-appear 😃

Anyone whose issue is not solved - I'd recommend following along in issue 114.

davebeckster commented 4 years ago

Agree, wish it had worked for me. Likely our setups are different. Appreciate your efforts. Moving on

On Mon, Apr 27, 2020, 15:05 RonSpawnson notifications@github.com wrote:

Sure - we'll use a different ticket to track since you are satisfied with your mitigation. I hope your mitigation turns out to be permanent and the issue does not re-appear 😃

Anyone whose issue is not solved - I'd recommend following along in issue 114.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/121#issuecomment-620260557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN75WI2U5YG3OL6KL4Q4DSTROX6TPANCNFSM4MDEEWZQ .