raetha / wyzesense2mqtt

Configurable WyzeSense to MQTT Gateway intended for use with Home Assistant or other platforms that use MQTT discovery mechanisms.
MIT License
80 stars 23 forks source link

Crashed and did not recover #12

Closed shauder closed 4 years ago

shauder commented 4 years ago

Describe the bug I am not sure what exactly caused it to crash. I just noticed events were no longer being processed. Here is the last bit of my log.

sensor_mac: 777E0848
State event data: [2020-03-26 00:06:02][777E0848]StateEvent: sensor_type=motion, state=inactive, battery=96, signal=60
{'available': True, 'mac': '777E0848', 'state': 0, 'device_class': 'motion', 'device_class_timestamp': '2020-03-26T00:06:02.480000', 'signal_strength': -60, 'battery': 96}
sensor_mac: 777E0848
State event data: [2020-03-26 00:06:02][777E0848]StateEvent: sensor_type=motion, state=inactive, battery=96, signal=60
{'available': True, 'mac': '777E0848', 'state': 0, 'device_class': 'motion', 'device_class_timestamp': '2020-03-26T00:06:02.480000', 'signal_strength': -60, 'battery': 96}
sensor_mac: 777E0848
State event data: [2020-03-26 00:06:02][777E0848]StateEvent: sensor_type=motion, state=inactive, battery=96, signal=60
{'available': True, 'mac': '777E0848', 'state': 0, 'device_class': 'motion', 'device_class_timestamp': '2020-03-26T00:06:02.480000', 'signal_strength': -60, 'battery': 96}
sensor_mac: 777E0848
State event data: [2020-03-26 00:06:02][777E0848]StateEvent: sensor_type=motion, state=inactive, battery=96, signal=60
{'available': True, 'mac': '777E0848', 'state': 0, 'device_class': 'motion', 'device_class_timestamp': '2020-03-26T00:06:02.480000', 'signal_strength': -60, 'battery': 96}
sensor_mac: 777E2715
Non-state event data: [2020-03-26 00:06:07][777E2715]RawEvent: type=raw_AB, data=b'020000030000000200000002003b00'
sensor_mac: 777E2715
Non-state event data: [2020-03-26 00:06:07][777E2715]RawEvent: type=raw_AB, data=b'020000030000000200000002003b00'
sensor_mac: 777E2715
Non-state event data: [2020-03-26 00:06:07][777E2715]RawEvent: type=raw_AB, data=b'020000030000000200000002003b00'
sensor_mac: 777E2715
Non-state event data: [2020-03-26 00:06:07][777E2715]RawEvent: type=raw_AB, data=b'020000030000000200000002003b00'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/site-packages/wyzesense/gateway.py", line 357, in _Worker
    pkt = Packet.Parse(s)
  File "/usr/lib/python3.8/site-packages/wyzesense/gateway.py", line 129, in Parse
    assert len(s) >= b2 + 4
AssertionError

Seems like it may be happening when I try to add a sensor and scan through mqtt topic to scan.

shauder commented 4 years ago

I unpaired then I paired another sensor and it seems to be fine now. I will leave open for now in case it helps find what may have gone wrong? I am fully migrated over now. As long as it does not crash now things are looking good!

raetha commented 4 years ago

That sounds good. I'm not sure what that error is at this point, but it's coming from the WyzeSensePy library, so might be slightly out of my control other than to work around. But let me know if it comes back and we'll see what we can do for additional debugging, and maybe pull that dev in for some guidance as well.

Thanks for giving this a shot too. It was kind of just a short passion project because I was looking for something like this and didn't find quite what I was looking for, though I hadn't seen the GoSense one until I'd already completed most of this. But it's nice having someone else help work kinks out. I might hit a few more too, just got another 12 contact sensors to rig up this weekend.

raetha commented 4 years ago

I did decide to log something with the library dev in case they have ideas. https://github.com/HclX/WyzeSensePy/issues/9

You mentioned you paired another sensor after unpairing the one that triggered that, but were you able to repair that sensor successfully, or is it sorta broken at this point?

shauder commented 4 years ago

Well so far I like it more than the go version. I liked that version because of the mqtt over the direct integration. Yours also seems to create the devices and sensors automatically so that is also a nice benefit. So thank you for making this!

If I remember right last night I tried to pair the bad sensor 2 times or so and got the same issue. So then I used the bridge tool to unpair it. Then I used the fix broken sensors command. After that I paired a brand new one. I have not done anything further with the old one and was going to throw it away but I can keep it around and maybe try some more with it on another pi.

raetha commented 4 years ago

Hurm, I'll hope that it is a bad sensor, but still be nice to work around if at all possible. Assuming I can find some time soon, I'll see if there might be some additional debugging I can add and then see if you are willing to help test.

Alternately, you could ship me the erring sensor and I'd take a look that way as well.

shauder commented 4 years ago

Looks like I got the error again this morning. Woke up and it looks like it happened when I went into my bathroom. The motion sensor triggered then it crashed at some point after.

State event data: [2020-03-29 09:06:53][777E06EE]StateEvent: sensor_type=motion, state=active, battery=96, signal=56
{'available': True, 'mac': '777E06EE', 'state': 1, 'device_class': 'motion', 'device_class_timestamp': '2020-03-29T09:06:53.032000', 'signal_strength': -56, 'battery': 96}
sensor_mac: 777E06EE
Non-state event data: [2020-03-29 09:06:54][777E06EE]RawEvent: type=raw_AB, data=b'020300000908050500000000003800'
sensor_mac: 777E06EE
Non-state event data: [2020-03-29 09:06:54][777E06EE]RawEvent: type=raw_AB, data=b'020300000908050500000000003800'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/site-packages/wyzesense/gateway.py", line 357, in _Worker
    pkt = Packet.Parse(s)
  File "/usr/lib/python3.8/site-packages/wyzesense/gateway.py", line 129, in Parse
    assert len(s) >= b2 + 4
AssertionError

I didn't add or remove any sensors since I made this and it was working fine so that is kinda strange. Restarting the container and they all seem to work again. I wouldn't be opposed to sending you the sensor. I may try and duplicate it with the sensor before I do though.

raetha commented 4 years ago

So that sensor does work at least after a restart, and you're getting good signals through with the open/close, but at some point it's failing?

From what I can tell, this may be the lack of error handling in the upstream library. Unfortunately I haven't heard back on the issue I logged there, so the only solution may be to fork that library and change the error handling to ignore the bad messages.

I honestly don't think the sensors themselves are bad, as long as they are working the rest of the time, I think they are just sending some message that it doesn't know how to deal with. Unfortunately for us both, I really don't have a handle on the underlying hardware, and thus how to properly handle what it is sending, but I think I can swing changing the error handling in the library if need be.

For now, I'd like to give the original author a little more time and see if they reply, but if not I'll assume the library has been abandoned and fork it.

Don't ditch the sensors causing trouble yet, if they are sending good signals most of the time, it may not be them.

raetha commented 4 years ago

@shauder, to let you know, just got another report of this, and still no response from the original library author. I'm going to fork WyseSensePy, hopefully tonight, and work around this issue as there are now three of us seeing it occasionally, and doesn't look like an upstream fix is going to happen anytime soon. I'll let you know when there is a build with it so you can help test as well.

shauder commented 4 years ago

So I actually have not seen this again since swapping out some sensors when I was having this issue. It seems like maybe some bad signal is being sent when they are bad or close to dying?

raetha commented 4 years ago

@shauder, no worries. I've just set a fix on the devel branch and targeted version 1.1. Gone a slightly different route from what Kevin Vincent has been trying on his project, so will be sharing there as well.

Anyway, my theory, like yours, is that something is happening as a sensor starts to have a low battery that results in a bad message being sent to the bridge. The existing library can't handle it, and crashes. I've tweaked it to theoretically silently ignore those messages instead.

If you'd like to test the new build, grab either the devel branch directly if you are using systemd, or Docker hub should be updated by specifying the devel branch instead of latest.

I'd love to know if all stays stable before closing this out.

shauder commented 4 years ago

Hey, sorry I must have missed your reply. I have updated the docker container and everything is still working as I expect!

raetha commented 4 years ago

Great, glad to hear. At least one person has still seen some random crashing, so I think there is still something to dig into, but this seems to have helped greatly. Still hoping someone with better hardware knowledge will figure out what the random packet is really doing so that we can make use of it. I don't want to assume it is a low battery state or something similar, as then we might automate something that isn't correct. :)