kevinvincent / ha-wyzesense

A Home Assistant Component to interface with the WYZE Sense hub and sensor system
368 stars 98 forks source link

Sensors Stop Responding - Requires Reboot #114

Open cheechie opened 4 years ago

cheechie commented 4 years ago

I was running hassio on a pie and the wyze sensors would just stop responding after a day or sometimes 3 days. A reboot would fix the problem. Recently I have switched to running home assistant in a VM (Proxmox), but i still get the same issue. A reboot of home assistant fixes it every time.

Let me know if I can provide any logs to help.

RyanWoirol commented 4 years ago

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would.

hmm. Has this been happening to anyone then? or should I start a new thread?

eregev commented 4 years ago

Mine would go from blue to yellow on the Pi but stays blue on a Debian PC install.

On Fri, Apr 3, 2020, 3:40 PM RyanWoirol notifications@github.com wrote:

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would.

hmm. Has this been happening to anyone then? or should I start a new thread?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-608647874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKFXSKTA2MTNUXOB5RNF3RKZCTTANCNFSM4LC7JNGQ .

RyanWoirol commented 4 years ago

Mine would go from blue to yellow on the Pi but stays blue on a Debian PC install. On Fri, Apr 3, 2020, 3:40 PM RyanWoirol @.***> wrote: Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow. I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would. hmm. Has this been happening to anyone then? or should I start a new thread? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#114 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKFXSKTA2MTNUXOB5RNF3RKZCTTANCNFSM4LC7JNGQ .

Yeah I am using a Pi 3B. Haven't had any problems with it until just this week. A reset of Hass.io will make the light turn blue and work for about 30 minutes, then it goes yellow and stops working until I reset again.

RyanWoirol commented 4 years ago

And now I can't get a blue light and the sensors aren't working at all.

robgazy commented 4 years ago

I'm having the same problem as op on my RPi4 running HassOS 3.12 where Wyze motion sensors via wyzesense 0.0.7 stop responding after unpredictable lengths of time. Updated to latest bridge firmware (0.0.0.33). Ran for about 3 days before failing this time, other failures have occurred after 9 hours, 18 hours, and 2 days. Posted my log for past 24 hrs here: https://pastebin.com/7mKRUiZ7 I'm willing to help troubleshoot further if that will help. Love the custom component but can't rely on it at the moment. Anyone got any leads yet?

robkiller26 commented 4 years ago

Same as above. Very unreliable running the same setup as above.

RoldyBuffalo commented 4 years ago

You should be able to update your bridge firmware with the latest RTSP firmware for your camera, I never reverted my firmware to stock and it updated just fine, again, with the latest version of RTSP. I updated my bridge, that fixed some issues (ones where automations don't trigger, sensors stop responding or simply won't connect) then once the firmware was updated on the bridge, I found a faulty sensor that was the causes of the issues I was seeing beyond the previously mentioned. Update bridge, remove bad sensors, turn on debug logging. Just jumping on the "It's broken" train without trying any of the aforementioned solutions is kind of not my problem. I told you how I fixed it (now on the 4th day of no issues) and I can verify is fixed by seeing how HA is running after removing the faulty sensor.

Cheers.

robkiller26 commented 4 years ago

So i did not jump on the "its broken" train. I posted months ago on many forums trying to help problem solve the issue and it never went anywhere. I just reposted again to help confirm his issue he is not alone. I have updated my firmware on my bridge but still have the same issue. i have not seen a bad sensor but i am not sure how to tell if i have a bad sensor to be honest.

Thanks for reaching out i am willing to try anything to get it reliable again. How do i tell if i have a bad sensor?

Thanks

Rob

On Fri, Apr 10, 2020, 11:11 AM VarenDerpsAround notifications@github.com wrote:

You should be able to update your bridge firmware with the latest RTSP firmware for your camera, I never reverted my firmware to stock and it updated just fine, again, with the latest version of RTSP. I updated my bridge, that fixed some issues (ones where automations don't trigger, sensors stop responding or simply won't connect) then once the firmware was updated on the bridge, I found a faulty sensor that was the causes of the issues I was seeing beyond the previously mentioned. Update bridge, remove bad sensors, turn on debug logging. Just jumping on the "It's broken" train without trying any of the aforementioned solutions is kind of not my problem. I told you how I fixed it (now on the 4th day of no issues) and I can verify is fixed by seeing how HA is running after removing the faulty sensor.

Cheers.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-612126345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEPBFBYNRSSUNONTKEMGBWDRL5HNFANCNFSM4LC7JNGQ .

cheechie commented 4 years ago

So what is the solution to this? or a workaround perhaps? I reboot my host (not hass) and it fixes the issue. I also figured out reinstalling the component and rebooting hass (not host) fixes it as well. Would really like to know how I can prevent my usb dongle from disconnecting every time supervisor updates, or worse at random like it seem to be doing about once every 2-3 days without manually rebooting the host. I run a vpn tunnel from my host as well (I know, I know, I need at least one more pi) but it's lightweight in the essence the ssh just needs to be kept open to keep the tunnel open. Nifty little failsafe. Anyway, I am experiencing all the issues described in this thread. The dongle disconnecting, not reconnecting, and the sensors remaining in their off position. This is only resolved with the methods I mentioned. Is using usb 3.0 instead of 2.0 the trick? Curious if it is, as on my rpi4b it happens with both usb 3.0 and usb2.0 BUT I am using powered usb3.0 hub as instructed by several guides for rpi4b and usb's. Something about their power circuitry only having a set amount of voltage across the usb lanes. Google it if you're curious.

I have not seen a solution yet. I am running it in a VM in proxmox. I have to power of my HA VM and then uplug my USB dongle and then plug it back in and reboot. I might go back to hassio on the pie, because then I just had to reboot and the sensors would come back online.

I sure hope this issue can be resolved, otherwise I will have to look for other sensors to use instead of wyze sensors.

Home assistant now loses connection to the sensors. And requires a reboot every 2 to 3 days now. Back to the same way it was. I didn't change or update anything.

robkiller26 commented 4 years ago

Firmware is updated same issue same errors below.

Mismatched checksum, remote=06E9, local=06F1

Invalid packet: b'55aa531d19000001718a2269b8a2373739423139383702146100010008083e06e9'

RROR (MainThread) [hacs] Validation for custom-components/sensor.wifi-scanner failed with GitHub returned 404 for https://api.github.com/repos/custom-components/sensor.wifi-scanner/contents/custom_components/blueprint/manifest.json.

tripp396 commented 4 years ago

Same issues here. Removing and re-adding 7/10 of mine seemed to fix it (3 were actually dead) for about 12 hours, then they got stuck again. Reboot of host did not fix. Have some batteries on the way. Going to change out all 10 and remove and re-add and see where that gets me

daveenguyen commented 4 years ago

I think the Worker thread is getting killed.

When the sensors stop responding, ws._Dongle__thread.is_alive() was returning False. I don't know the reason the thread is dead. Don't see anything in the logs.

I am playing around with this at the moment https://github.com/daveenguyen/ha-wyzesense/tree/bugfix/dead-thread

I created a restart service and a binary sensor for the thread's alive state.

With those two, I created an automation to wyzesense.restart whenever the thread dies.

Hopefully this tips someone towards figuring this out

RonSpawnson commented 4 years ago

Thanks daveenguyen, I think you are onto something here in regards to worker thread being killed. We just have to figure out what is killing it or where it is being killed. Maybe this is a stab in the dark, but I just experienced this again and looked at my debug level logs and saw the following:

2020-04-25 18:07:55 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171b3986dd10ea23737394231373700001ae40805'

Normally, this is followed be a message which says "Received: ...". But in this instance, it was the last log message I ever saw from wyzesense. Looking at code, the "watchdog" loop has an 'except' clause of OSError, after logging this there is then a 'break' statement, which would mean the While loop would be exited.

I'm going to try to remove the "break" in the OSError except, as I really think that this would kill the watchdog thread. I'll add additional logging in the except clause as well so I can try to detect if this actually saves me from it going haywire ever. I'm also going to try to add a generic "Exception" except clause and add logging there as well, to see if perhaps a silent, non OSError exception is being thrown and not logged, and killing worker thread due to not having an except clause.

Will update with findings next time wyzesense goes down, or, if I detect that this change saved me from wyzesense going down.

RonSpawnson commented 4 years ago

@daveenguyen I have a code diff to share which attempts the fix I described above. Since this issue often takes days-to-weeks to repro for me, it will take some time for me to determine if it was successful or not. Since it may take a while for verification, I'm sharing the diff with others in case others want to attempt it or continue searching for other potential problems. Commit description contains additional information, as well as success and failure criteria. Keep in mind this is completely unverified, I have no idea if this fixes the issue or not. But since it will take me some time to know, I'm sharing my attempt now.

Here's the diff: https://github.com/RonSpawnson/ha-wyzesense/commit/d6e09744659814c8544ba86ae964b73fa675ab07 Update: This did not appear to resolve the issue. See my next comment for findings.

gterpstra75 commented 4 years ago

Thanks @daveenguyen and @RonSpawnson for your insight and comments. I have been fighting this issue for several months now. I have 19 motion and 10 door sensors. I have tried many of the things already suggested such as updated firmware, replaced batteries, checked/removed bad sensors, scheduled reboots of HA and monitoring battery levels and RSSI levels. Some of these changes seemed to help for a period of time but the problem always remained. @RonSpawnson, I have added the changes which you are trying to my wyzesense_custom.py and rebooted. (I assume that is all that I need to do in order to implement). I will let you know any information which I can gather.

I believe the following reported issues are all related:

84 Wyze Sense sensors stopped working

117 Sensor stop responding, partially, or completely

121 Sensors completely stop responding after a random period of time

119 Sensors stop working after random periods of time

109 Scan sensor w/o MAC

98 Dead battery on contact sensor now having issues reconnecting

robgazy commented 4 years ago

@RonSpawnson Thanks, I'm in. Got your modified wyzesense_custom.py installed and restarted HA. I've even got one suspicious sensor that is working at the moment. Let's see what happens.

RonSpawnson commented 4 years ago

Update: My findings are the fix d6e0974 does not resolve the issue. I get many many lines letting me know it detected an OS error (the log file grows very rapidly). This means the worker thread is no longer being killed, but the OSError is indeed persistent. I see a few potential places we can go from here.

(1) Could we detect issue and recover automatically? I like the direction @daveenguyen has taken this - is there a way we can detect the OSError even higher in the stack, and automatically re initialize? I think we would need/want this somewhere in "binary_sensor.py". (2) Can we prevent this issue from recurring? Requires deep dive as to why an OSError is occurring and how we can prevent it from occurring.

(1) seems like an easier fix, but it's a mitigation rather than solving the root cause (2). However, we'd still be in a much better spot even with an automatic mitigation.

The hunt continues!

gterpstra75 commented 4 years ago

Not long after I previously posted wyzesense stopped working (last few lines from log)

2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b87a1ef4a23737394230343531021a6000010019c44a083755aa531d1900000171b87a1ef43737394230343531021a6000010019c4' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171b87a1ef4a23737394230343531021a6000010019c44a0837' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171b87a1ef4a23737394230343531021a6000010019c44a' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319) 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '779B0451', 'state': 0, 'device_class': 'motion', 'timestamp': '2020-04-26T17:52:23.284000', 'rssi': -74, 'battery_level': 96} 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b87a1ef43737394230343531021a6000010019c44a0837' 2020-04-26 17:52:44 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Caught a non OSError exception.

daveenguyen commented 4 years ago

@RonSpawnson I think you are right about non-OSError Errrors causing the thread to die.

Been running similar changes for a few days now. I noticed the thread haven't been killed so far, but am seeing this AssertionError https://github.com/HclX/WyzeSensePy/issues/9

Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError

That same AssertionError multiple times the past few days (edited for shorter length)

2020-04-23 13:51:42 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-23 14:18:30 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-23 17:44:45 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 16:52:07 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 17:27:39 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 20:47:59 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-25 18:51:44 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-25 20:59:42 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-26 00:46:24 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-26 11:03:45 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError

Here are the changes I have

    def _Worker(self):
        while True: #Watchdog
            try:
                s = b""
                while True:
                    if self.__exit_event.isSet():
                        break

                    s += self._ReadRawHID()
                    #if s:
                    #    log.info("Incoming buffer: %s", bytes_to_hex(s))
                    start = s.find(b"\x55\xAA")
                    if start == -1:
                        time.sleep(0.1)
                        continue

                    s = s[start:]
                    log.debug("Trying to parse: %s", bytes_to_hex(s))
                    pkt = Packet.Parse(s)
                    if not pkt:
                        s = s[2:]
                        continue

                    log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                    s = s[pkt.Length:]
                    self._HandlePacket(pkt)
+               if self.__exit_event.isSet():
+                   log.info("Exiting because exit event was set")
+                   break
-           except OSError as e:
-               log.error(e)
-               break
+           except:
+               log.exception("Something went wrong in Worker Thread")

My logs does not have any "Exiting because exit event was set", so it's not needed (it was added for my restart service I mentioned earlier)

RonSpawnson commented 4 years ago

More info I'm seeing is that after my OSError(s), I noticed the hidraw device for my usb bridge is different! When it first started up my hidraw was hidraw0: Attempting to open connection to hub at /dev/hidraw0.

Following my OSError and using your 'restart' branch, I noticed that the hidraw is now hidraw3 by running ls -la /sys/class/hidraw and looking for the entry containing "IA86" and "E024". This explains why running the wyzesense restart service fails with message: No such device: '/dev/hidraw0'. In order to correct this, we'll need to find what the new dongle is, which can be done with findDongle().

My current thought is we don't want to restart the worker thread at all. Instead, I want to stop it, find the new dongle, and reinitialize everything, including a new worker thread. Here is my attempt thus far (still not verified working yet).

def on_restart(call):
        _LOGGER.debug("Restarting ws!")
        config[CONF_DEVICE] = findDongle()
        ws.Stop()
        setup_platform(hass, config, add_entites, None)

Also worth noting - I found a really easy and quick way to repro the issue. Just un-plug and re-plug the wyzesense bridge usb stick. Then we are in the OSError exception state with new hidraw. This allows rapid iteration/testing of fixes.

RonSpawnson commented 4 years ago

I'm so close, but the issue I'm experiencing is that I've now got it attempting to use the correct hidraw during init, but I'm getting a file not found error for the new hidraw. As per issue #66, this appears to be an issue with HA supervised that it cannot see the new hidraw devices in the container.

2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Restarting ws!
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] WYZESENSE v0.0.7
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Attempting to open connection to hub at /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Attempting beginConn with /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] Inside init
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] total 0
drwxr-xr-x    2 root     root             0 Apr 26 21:58 .
drwxr-xr-x   63 root     root             0 Apr 26 21:58 ..
lrwxrwxrwx    1 root     root             0 Apr 26 20:30 hidraw1 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.4/1-1.4:1.0/0003:04D9:2013.0002/hidraw/hidraw1
lrwxrwxrwx    1 root     root             0 Apr 26 20:30 hidraw2 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.4/1-1.4:1.1/0003:04D9:2013.0003/hidraw/hidraw2
lrwxrwxrwx    1 root     root             0 Apr 26 21:58 hidraw5 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.0/0003:1A86:E024.0011/hidraw/hidraw5 [NOTE: THIS IS MY NEW HIDRAW FOLLOWING REPLUG]

2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] Device is: /dev/hidraw5
2020-04-26 21:58:30 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.2882217968] [Errno 2] No such file or directory: '/dev/hidraw5'
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 134, in handle_call_service
    connection.context(msg),
  File "/usr/src/homeassistant/homeassistant/core.py", line 1232, in async_call
    await asyncio.shield(self._execute_service(handler, service_call))
  File "/usr/src/homeassistant/homeassistant/core.py", line 1259, in _execute_service
    await self._hass.async_add_executor_job(handler.func, service_call)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/wyzesense/binary_sensor.py", line 197, in on_restart
    setup_platform(hass, config, add_entites, None)
  File "/config/custom_components/wyzesense/binary_sensor.py", line 119, in setup_platform
    ws = beginConn()
  File "<decorator-gen-4>", line 2, in beginConn
  File "/usr/local/lib/python3.7/site-packages/retry/api.py", line 74, in retry_decorator
    logger)
  File "/usr/local/lib/python3.7/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/config/custom_components/wyzesense/binary_sensor.py", line 117, in beginConn
    return Open(config[CONF_DEVICE], on_event)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 580, in Open
    return Dongle(device, event_handler)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 290, in __init__
    self.__fd = os.open(device, os.O_RDWR | os.O_NONBLOCK)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/hidraw5'
gterpstra75 commented 4 years ago

So I implemented @daveenguyen lastest code. I like @robgazy have two suspected motion sensors that are enabled for this testing. The logs are indicating a couple of ERROR messages in Worker Thread:

_2020-04-26 19:49:14 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread 2020-04-27 06:13:28 ERROR (Thread-10) [custom_components.wyzesense.wyzesensecustom] Something went wrong in Worker Thread

But the WyzeSence continues to work.

RonSpawnson commented 4 years ago

Thanks gterpstra75. I want to point out that due to the code you are running, this may be a red herring. Allow me to explain.

Previous behavior: Kill worker thread on OSError. Silently ignore any other exceptions which are not OSError, worker thread continues.

Current behavior with code you are running: Do not kill worker thread on OSError. Print "Something went wrong in Worker Thread" message regardless of whether this was OSError or non OSError.

Because of this, it's possible that the fact that you are seeing "Something went wrong" message, but wyzesense continuing to work is a bit of a red herring. My proposal is to collect more detailed data. We should not be confusing problematic OSErrors with non-problematic non OSError exceptions. Additionally, instead of just printing "Something went wrong", we should be printing the actual exceptions themselves for more detail.

My proposal is to modify dave's code as follows to collect more detail:

In imports near top, add this line:

    import binascii
    import errno
    import subprocess
+   import traceback

And use the following two catch statements in _Worker function. Note that I removed the "break" from the OSError except clause, just like in Dave's code.

def _Worker(self):
        while True: #Watchdog
            try:
                s = b""
                while True:
                    if self.__exit_event.isSet():
                        break

                    s += self._ReadRawHID()
                    #if s:
                    #    log.info("Incoming buffer: %s", bytes_to_hex(s))
                    start = s.find(b"\x55\xAA")
                    if start == -1:
                        time.sleep(0.1)
                        continue

                    s = s[start:]
                    log.debug("Trying to parse: %s", bytes_to_hex(s))
                    pkt = Packet.Parse(s)
                    if not pkt:
                        s = s[2:]
                        continue

                    log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                    s = s[pkt.Length:]
                    self._HandlePacket(pkt)
                     log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                     s = s[pkt.Length:]
                     self._HandlePacket(pkt)
+               if self.__exit_event.isSet():
+                   log.info("Exiting because exit event was set")
+                   break
            except OSError as e:
+                log.error("Caught an OSError in Worker thread")
                 log.error(e)
-                break
+           except:
+                 log.error("Caught a non OSError exception in Worker thread, but wyzesense may still continue working. If it stops working following this message, the below error is suspect. If it continues working, it was a red herring that can be ignored.")
+                 traceback.print_exc()

Finally, to summarize my investigation after running the above - for me it was OSErrors that I am seeing which were indeed killing the worker thread and causing wyzesense to stop. I also discovered it's super easy to reproduce them too - just un-plug and replug the usb stick. I've made additional code modifications to Dave's attempt at restart (I can share later), and am super close to it working, however the issue is that the new hidraw device is not exposed to the homeassistant docker container. Issue #66 contains more details. I'm currently working on seeing if we can modify supervisor to bind all hidraw/wyzesense devices, but it's not clear to me how to modify supervisor locally.

gterpstra75 commented 4 years ago

@RonSpawnson, no problem. I have implemented your suggestions and willing to help in any way that I can. I am just glad that you and @daveenguyen are trying to fix this issue. I would love to help but I have no python programming skills.

RonSpawnson commented 4 years ago

Thanks! We appreciate the help! I'm confident with enough time and effort we will figure out a fix, or at least a mitigation. In the meantime, keep in mind that in the code changes you just did, if you see the "Caught a non OSError exception in Worker thread" message, you will still need to determine whether or not it was a red herring. If this message happened but wyzesense is still working, sure it was an exception, but it's ok to ignore as this was red herring which wasn't causing wyzesense outage. If, though, you see the non OSError exception and then wyzesense stops working, it is then important.

Put more simply: Run this until you see wyzesense stop working. Then you can determine what was the last exception you saw - OSError, or non OSError, and what were additional details about the exception. In my case it's an OSError, and it's reproducible by unplugging and replugging the bridge from my pi. Let us know what you find in your case.

daveenguyen commented 4 years ago

logging.exception does log some trace info. https://docs.python.org/3/library/logging.html#logging.exception

@gterpstra75 do you see more lines after the error message without date and time?

The only errors I've encountered in the past 4 days are 10 instances of these AssertionError

Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError
daveenguyen commented 4 years ago

@RonSpawnson I think it's not a red herring because the bare except is catching all Errors that would've been raised before. Uncaught errors are now "handled" in this loop.

To make the loop behave like the original code, we'd need to add the break back in the OSError handling block, and raise inside the bare except block.

I think the bare except (without raising) is keeping our thread alive.

            except OSError as e:
                log.error("Caught an OSError in Worker thread")
                log.error(e)
+               break
           except:
                log.error("Caught a non OSError exception in Worker thread, but wyzesense may still continue working. If it stops working following this message, the below error is suspect. If it continues working, it was a red herring that can be ignored.")
                traceback.print_exc()
+               raise
gterpstra75 commented 4 years ago

@daveenguyen, I haven't seen any error message yet with the latest code that I have implemented. I will report back if/when I do.

gterpstra75 commented 4 years ago

I got my first non OSError message but Wyzesense has continued to run. Here is the messages around the error message:

_2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171c0c1af660ea2373738303233434402000bfa0821' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171c0c1af660ea2373738303233434402000bfa0821' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171c0c1af660ea2373738303233434402000bfa' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335) 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286' 2020-04-28 08:27:48 INFO (Thread-10) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-28T08:27:31.046000, data=b'a2373738303233434402000bfa' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000b6008f8' 2020-04-28 08:27:48 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Caught a non OSError exception in Worker thread 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000bfa6008f8' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000bfa6008f8' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171c0c1af94a2373738303233434402115d0001000bfa60' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319) 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '778023CD', 'state': 0, 'device_class': 'motion', 'timestamp': '2020-04-28T08:27:31.092000', 'rssi': -96, 'batterylevel': 93}

I don't know how to get a traceback.

daveenguyen commented 4 years ago

https://github.com/daveenguyen/ha-wyzesense/blob/bugfix/dead-thread/custom_components/wyzesense/wyzesense_custom.py#L381

Updated my branch to raise in the bare except. Thread dies. My sensor and restart automation kicks in. After the non-OSError/restart service, it is (Thread-15) and not (Thread-2)

The gap between the AssertionError and 2020-04-30 14:40:12 DEBUG (SyncWorker_5) [custom_components.wyzesense.binary_sensor] Updating Watchdog to False I am assuming is the next polling update. In the original ha-wyzesense, this is where we see our sensors stop responding (because the thread is dead).

2020-04-30 14:40:00 INFO (Thread-2) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:39:53.725000, data=b'ad373738394131423102011cc9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd081f000ead37373839413142310266666606b955aa53193500000171cd081f000ead373738394131310266666606b9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd081f000ead37373839413142310266666606b9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd081f000ead373738394131423102666666'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:00 INFO (Thread-2) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:39:53.728000, data=b'ad373738394131423102666666'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd081f000ead373738394131310266666606b9'
2020-04-30 14:40:00 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] non-OSError in worker thread. Reraising
Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError
2020-04-30 14:40:12 DEBUG (SyncWorker_5) [custom_components.wyzesense.binary_sensor] Updating Watchdog to False
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Restarting Dongle
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Start Inquiry...
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=4327, Payload=<None>
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55430327016c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa43042801016f'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa43042801016f'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=4328, Payload=b'01'
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Inquiry returns 1
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=5314, Payload=b'ff'
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55530414ff0269'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171ca66fc87a2000b740882'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171ca66fc87a2000b740882'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171ca66fc87a2000b74'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084b4a0ead373738374242424602010b7506a3'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084b4a0ead373738374242424602010b7506a3'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084b4a0ead373738374242424602010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.066000, data=b'ad373738374242424602010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa531939ad37373837424242460200000171cb18e767a2010b75080255aa5314'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa5314'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa5314'
2020-04-30 14:40:12 ERROR (Thread-15) [custom_components.wyzesense.wyzesense_custom] Invalid packet: b'55aa5314'
2020-04-30 14:40:12 ERROR (Thread-15) [custom_components.wyzesense.wyzesense_custom] Invalid packet length: 4
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa530e3500000171cd084bae0314ff04eb'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530e3500000171cd084bae0314ff04eb55aa53193500000171cd084c070ead373738374242424602000b76066155aa530315016a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa530e3500000171cd084bae0314ff04eb'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084bae0314ff'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.166000, data=b'14ff'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084c070ead373738374242424602000b76066155aa530315016a55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084c070ead373738374242424602000b760661'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084c070ead373738374242424602000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.255000, data=b'ad373738374242424602000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530315016a55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa530315016a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5315, Payload=<None>
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5315)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555315ff0266'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb1983a7a2000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530332018755aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa5303320187'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5332, Payload=<None>
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5332)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555332ff0283'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=5333, Payload=b'00000171cd08685b'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55530b3300000171cd08685b039a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb1983a7a2000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084cd00ead373738374242424602010b77072c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084cd00ead373738374242424602010b77072c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084cd00ead373738374242424602010b77'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.456000, data=b'ad373738374242424602010b77'
2020-04-30 14:40:43 DEBUG (SyncWorker_3) [custom_components.wyzesense.binary_sensor] Updating Watchdog to True
daveenguyen commented 4 years ago

I think people can use #129 as a simple workaround for now and share non-OSError they encounter

daveenguyen commented 4 years ago

@gterpstra75 If you're using a change with log.exception, you should be able to see it in <home-assistant-url>/developer-tools/logs when you click on the wyzesense error to view more details.

image

gterpstra75 commented 4 years ago

Thanks @daveenguyen. So far, I have only had one non-OSE error message when I was running code (from https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-620089443) which resulted in failure of wyzesence. I didn't get a traceback with the error message, just the message of the error. I am now running code similar to what you suggested above except that I have added the traceback. I do see the traceback in the log messages now. I also get several non-OSE error messages a day which doesn't result in failure of the wyzesence hub. I assume that previously (looking at the code, hey still trying to learn here) these would have resulted in failure due to the break statement. Thanks again, my wyzesense I believe has been much more stable this last week.

AhmedAdelHosni commented 4 years ago

I dont known if what I say will help or not but have to mention that I used to put my laptop in a closet, 2 meters above the ground and maybe I have to restart every two weeks for example. But lately I had to change the place of the closet and I put the laptop on the floor aganist the wall. I may restart now 2 times a day or every 3 days ... but it is being annoying now. So I am not sure maybe this place misses some packets and is not easy to communicate with the nodes and this weak connection leads to some type of error. I hope this help.

raetha commented 4 years ago

@daveenguyen I've been struggling with this same issue with my project Wyzesense2MQTT. I believe it is an unknown packet type, likely triggered when a sensor's battery is low, as I seem to observe in my setup. The fix I'm testing is in the Parse function:

        if cmd == cls.ASYNC_ACK:
            assert len(s) >= 7
            s = s[:7]
            payload = MAKE_CMD(cmd_type, b2)
        elif len(s) >= b2 + 4:
            s = s[: b2 + 4]
            payload = s[5:-2]
        else:
            log.error("Invalid packet: %s", bytes_to_hex(s))
            return None

I changed the else with an assert, into an elif instead, and added an else catchall to log the failures.

Not sure if this will truly resolve the worker crashes, but wanted to share since I found you are having the same issues, and my logged issue with WyzeSensePy was linked above in this thread.

If anyone wants to help determine what the err'ing packet actually is (low battery warning?), I'd love to see us actually be able to make use of it somehow with HA.

robkiller26 commented 4 years ago

I agree that it has to do with something being done with low batteries. I have about 30 sensors combined between motion and magnetic. It was at first i could go a week with no issues all bat show above 80 percent. Now i have stopped using HA because my wyze stops working min/sec after i reboot. All sensors still show good battery but nothing else has changed. It started to need a reboot closer and closer together until now it wont work more then a min or so. I have stopped putting time or using HA since all my automatons involve wyze sensors until this issue is solved. I am up for options on how to find low or bad sensors but updated firmware and everything has not lead to any bad sensors. I see the same errors as everyone is describing above.

raetha commented 4 years ago

@robkiller26 You could certainly try testing modifying the Parse function in your copy of HA-WyseSense as I specified above and see if that keeps things more stable. It would be interesting to note if some (but not all) of your sensors stop sending data, at which point that might be a trigger to know there is a low battery.

I love these sensors for cost, but their battery states are not very accurate. I've had ones that report 70%+, and then suddenly start having the bad MAC address issue. A battery replacement has almost always brought them back, but detecting when they actually need new batteries seems to only be reliable from the sensor itself when it starts flashing the led red periodically even without a sensor trigger. For the contact sensors, that is easy to see visually, but the motion sensors is harder as when you can see them, they can usually see you. :)

robkiller26 commented 4 years ago

I will look into that. I was wondering if the battery or issues showing up with the sensors would be easier to see with the wayze app. I was thinking about setting all my stuff up on the wayze camera and loading everything in the app to see if it would tell me what sensor is having issues.

Anyone try that?

On Mon, Jun 15, 2020, 8:35 AM Elias Hunt notifications@github.com wrote:

@robkiller26 https://github.com/robkiller26 You could certainly try testing modifying the Parse function in your copy of HA-WyseSense as I specified above and see if that keeps things more stable. It would be interesting to note if some (but not all) of your sensors stop sending data, at which point that might be a trigger to know there is a low battery.

I love these sensors for cost, but their battery states are not very accurate. I've had ones that report 70%+, and then suddenly start having the bad MAC address issue. A battery replacement has almost always brought them back, but detecting when they actually need new batteries seems to only be reliable from the sensor itself when it starts flashing the led red periodically even without a sensor trigger. For the contact sensors, that is easy to see visually, but the motion sensors is harder as when you can see them, they can usually see you. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-644173318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEPBFB62DEI5M6HU5QIVTD3RWYWRHANCNFSM4LC7JNGQ .

elyorkhakimov commented 4 years ago

I'm having similar issues on gosense. Looks like the parser is unable to make sense of one of the three message types. (I've only seen three unique messages, not counting the ones during the Pairing process) First message type asserts that "state:1", e.g. motion is detected.

2020-06-19T19:33:30.561536251Z DEBU[0578] readRawHid: 62 bytes: [ 55aa53193500000172ce123b360ea2373741313742414202010053064655aa531d1900000172ce123b39a2373741313742414202176400010100531606b5 ]

Second message asserts that "state:0", e.g. no more motion. And it always happens 40 seconds after the motion is detected:

2020-06-19T19:34:10.558263651Z DEBU[0618] readRawHid: 62 bytes: [ 55aa53193500000172ce12d0230ea237374131374241420200005406c855aa531d1900000172ce12d026a237374131374241420217640001000054170738 ]

Third message almost always (pretty much always), comes in exactly 2 minutes after the first motion detected message.

2020-06-19T19:35:30.557162901Z DEBU[0698] readRawHid: 39 bytes: [ 55aa53231900000172ce13f9f7ab37374131374241420200000100000100000100000117000776 ]

Note that the last message is 39 bytes vs 62 bytes for the first two. It is misinterpreted by parser, but gosense library does not seem to care and chugs along nicely. All I get is a weird MQTT parsed message - battery says 0%, which is not true. Anyways, this issue is benign in gosense (not sure in wyzesense.py) but forces me to put some error checking in MQTT flows.

elyorkhakimov commented 4 years ago

this is what the misinterpreted 39-byte long message looks like in Node-Red, where I've subscribed to mqtt topic from gosense

image

kevinvincent commented 4 years ago

if cmd == cls.ASYNC_ACK: assert len(s) >= 7 s = s[:7] payload = MAKE_CMD(cmd_type, b2) elif len(s) >= b2 + 4: s = s[: b2 + 4] payload = s[5:-2] else: log.error("Invalid packet: %s", bytes_to_hex(s)) return None

@raetha has yours been more stable with this fix? I'm thinking of pulling it into the package.

RonSpawnson commented 4 years ago

Hey Kevin - this fix definitely has, I believe this to be a good change. I'm still experiencing occasional outages that I think are related to issues with the udev and HA docker - those issues are 100% reproducible by unplugging and replugging your usb dongle in and out, and the HA container no longer has visibility into the hidraw device until HA restarts. And I think something with power management in my pi is causing this to occasionally blip out and require restart. For context: https://github.com/kevinvincent/ha-wyzesense/issues/66

That being said, anecdotally, since adding the diff you mention, I have noticed a perceivable decrease in frequency of restarts being required. So I personally feel this fix is a good one.

cheechie commented 4 years ago

My sensors have been running nonstop for over a month without any issue. Running VM in proxmox. It never requires a reboot anymore.

raetha commented 4 years ago

@kevinvincent to confirm, it has helped significantly with the issues being seeing. We're still chasing down one lockup issue, but it seems to be related to the bridge and USB connection flaking out. I've just modified the retry settings on the connection function to ignore IOError to see if it covers that. But with just the assert change we aren't seeing the errant packet killing things anymore. So I think you are pretty safe. Still love to see someone with some hardware chops help us figure out what that packet really means though. If as I think, it's a low battery warning, that would be great to make use of, instead of getting stuck with the invalid MAC issue that shows up when a sensor runs too low.

kevinvincent commented 4 years ago

@raetha Great to hear. Yeah, unfortunately, I don't have too much experience with the reverse engineering side to figure out what exactly that packet means. It would be cool if it was a low battery warning we could catch. I've just had to check for the 3 fast blinks on my sensors to know when to replace them.

I've gone ahead and released v0.0.9 with that fix. Thank you :)

raetha commented 4 years ago

You are very welcome, happy to share. Feel free to pour through the rest of my project's code sometime and see if there might be anything else we've caught that could be useful to you. I don't remember all the things off the top of my head, just that I spent about a month right off the bat working through a variety of things that behaved weirdly.

Now that the bad packets are at least getting something logged on my side, I'm going to watch for those log messages periodically and see if they seem to occur the next time I have a low battery. I don't know how to convert the packet to useful data, but it would be nice to confirm my suspicion that they are related. Then if we can at least parse the MAC from it somehow, we'd have a low battery warning. :)

daleye commented 3 years ago

Running 0.9 on rpi and finding I need to reboot every now and then. Help :)

raetha commented 3 years ago

@kevinvincent one of my contributors recently figured out how to modify the WyzeSensePy library and capture the ~4hr updates the sensors send with their current state. I haven't been able to test yet, but merged the code into my devel branch. Assuming it works, there isn't really a "low" batter alert, but rather more regular check-ins than just waiting until a sensor is tripped.

A couple threads also mentioned that the battery level may be off by a factor of 5. Not sure we can count on that, but might look into applying that logic just to show a more realistic battery level.

photinus commented 3 years ago

@raetha if you submit a pull request I can merge it in :-)

atjshop commented 3 years ago

This repo is marked as "No Longer Maintained". Is there any fork that still actively maintained?