raetha / wyzesense2mqtt

Configurable WyzeSense to MQTT Gateway intended for use with Home Assistant or other platforms that use MQTT discovery mechanisms.
MIT License
79 stars 22 forks source link

Config appears to be published before connected, and connection takes ~1 minute #22

Closed dale3h closed 4 years ago

dale3h commented 4 years ago

Describe the Bug

When the wyzesense2mqtt service is started, a connection to the MQTT broker is attempted. Immediately after attempting a connection all of the MQTT discovery payloads/config are published. About 1 minute later MQTT shows to be connected successfully, and at the same time all of the payloads appear in the output of mosquitto_sub -t '#'.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Start/restart wyzesense2mqtt service (i.e. sudo systemctl start wyzesense2mqtt)
  2. View logs (i.e. sudo journalctl -fu wyzesense2mqtt)
  3. Look for Connecting to MQTT host ... and note the timestamp
  4. Look for Connected to MQTT: return code 0 and note the difference in time

Expected Behavior

Connection to MQTT broker should only take a second or two at most, and payloads should not be published until after connected to MQTT broker.

Additional Context

Installation Method

I am running wyzesense2mqtt on latest Raspberry Pi OS (Raspbian GNU/Linux 10 (buster)) using a systemd service. Host is a Raspberry Pi 3 Model B+.

Related Logs

https://hastebin.com/ihuloxoxam

raetha commented 4 years ago

This one is interesting. I've never seen on my side the MQTT connection take that long. Since the connection is async, the topics may publish in memory before the connection is established, but the MQTT library should hold those and send them once the connection is up, so there shouldn't be an impact to that timing. I would like to see if we can speed up the MQTT connection though.

I assume you're using Mosquitto, is it running on the same host, or different host?

Also, are you actually seeing it fail to send the queued MQTT topics once connected? It's possible I'm calling Paho incorrectly and it isn't really queuing things up like I believe.

dale3h commented 4 years ago

Yes, I am using mosquitto. It is running on the same hypervisor/machine, but under a different VM.

The queue topics seem to be successfully sent once the connection is finally established.

Could this be the result of me having as many sensors as I do? Is there any way for me to test it without publishing everything before the connection is established? This would narrow it down to whether or not it's being delayed by the queue.

raetha commented 4 years ago

I'm using docker instead of VMs, but similar type of set up. So shouldn't be a massive difference there.

Looking at the logs you posted originally, I see a big time gap between the last publish and the mqtt client connection, so I'm thinking it shouldn't be the extra publishes early, but I'm not hugely knowledgeable on the Paho MQTT library and how it works. It may be that having a large queue of topics to send before the connection is established does cause some delay.

If you want to test, just comment out these lines in the init_sensors function. This will skip the publishing of the discovery topics on start. Note that could have adverse affects long term though as I think it's the only time existing sensors are republished, so if those topics ever expire or disappear, they wouldn't be recreated.

    for sensor_mac in SENSORS:
        if (valid_sensor_mac(sensor_mac)):
            send_discovery_topics(sensor_mac)

Let me know what you find. We could also go the route of forming new connections to mqtt only when needed and publish in realtime, but that will compound any lingering delays for every single message instead of just on initial connection.

raetha commented 4 years ago

The other possible thing I found that might help here is to add true threading support for the bridge and mqtt sides of things, rather than just using the built in functions. My worry is that gets well out of my wheelhouse, but I think could be worth testing. Then those two threads could be started as soon as the config is loaded, but before sensors and other things. We could then easily track when the mqtt connection tries to start, and when it succeeds. I'm just not sure it's worth the complexity at this point. At least not without someone contributing who really understands threading and how best to implement it.

dale3h commented 4 years ago

The time gaps have shifted, but are still present between connecting and connected:

Jul 17 23:32:35 wyzesense2mqtt python3[3115]: Connecting to MQTT host XXXXXXXX
...(1s)...
Jul 17 23:32:36 wyzesense2mqtt python3[3115]: Reading sensors configuration...
...(52s)...
Jul 17 23:33:28 wyzesense2mqtt python3[3115]: Publishing discovery topics for 77800DFB
...(0s)...
Jul 17 23:33:28 wyzesense2mqtt python3[3115]: Connected to MQTT: return code 0

More log output available here: https://hastebin.com/agupimarif

dale3h commented 4 years ago

I commented out the sending of discovery topics in init_sensors and the delay is still present. I am adding in some extra debug output that might help me figure out where the delay is happening.

Jul 17 23:46:07 wyzesense2mqtt systemd[1]: Starting wyzesense2mqtt...
Jul 17 23:46:17 wyzesense2mqtt systemd[1]: Started wyzesense2mqtt.
Jul 17 23:46:18 wyzesense2mqtt python3[3184]: Logging initialized...
Jul 17 23:46:18 wyzesense2mqtt python3[3184]: Reading configuration...
Jul 17 23:46:18 wyzesense2mqtt python3[3184]: Connecting to MQTT host XXXXXXXX
Jul 17 23:46:18 wyzesense2mqtt python3[3184]: Connecting to dongle /dev/hidraw0
Jul 17 23:46:18 wyzesense2mqtt python3[3184]:   MAC: 77865404,  VER: 0.0.0.30 V1.4 Dongle UD3U,  ENR: b'J\xfb\xc5\xa1\r\xcf\x86D$6\xd3\xae\x17`&\xd5'
Jul 17 23:46:18 wyzesense2mqtt python3[3184]: Reading sensors configuration...
Jul 17 23:47:11 wyzesense2mqtt python3[3184]: Connected to MQTT: return code 0

UPDATE 1: I am making progress! It looks like the delay is happening immediately after result = WYZESENSE_DONGLE.List() in init_sensors:

Jul 17 23:54:36 wyzesense2mqtt systemd[1]: Starting wyzesense2mqtt...
Jul 17 23:54:46 wyzesense2mqtt systemd[1]: Started wyzesense2mqtt.
Jul 17 23:54:47 wyzesense2mqtt python3[3233]: Logging initialized...
Jul 17 23:54:47 wyzesense2mqtt python3[3233]: Reading configuration...
Jul 17 23:54:47 wyzesense2mqtt python3[3233]: Connecting to MQTT host XXXXXXXX
Jul 17 23:54:47 wyzesense2mqtt python3[3233]: Connecting to dongle /dev/hidraw0
Jul 17 23:54:48 wyzesense2mqtt python3[3233]:   MAC: 77865404,  VER: 0.0.0.30 V1.4 Dongle UD3U,  ENR: b'J\xfb\xc5\xa1\r\xcf\x86D$6\xd3\xae\x17`&\xd5'
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: Reading sensors configuration...
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] File exists, start reading...
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Finished reading sensors config.
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Loop over SENSORS to add invert_state...
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77800DFB
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77A5E27A
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77A5ECA3
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77A5FBD4
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77877877
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 778780B7
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77A5FCF6
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77835FAF
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77825BF1
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77835E02
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7782986A
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7784FF23
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 778362A9
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 777EDD2F
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7784FDA3
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 777F01FB
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7783716A
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7782787F
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 777EFF2A
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7782792E
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 7782784A
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 777EF86E
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77836170
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Current sensor: 77825B98
Jul 17 23:54:48 wyzesense2mqtt python3[3233]: [TRACE] Get list from dongle...
Jul 17 23:55:28 wyzesense2mqtt python3[3233]: sensor_mac: 77A5E27A
Jul 17 23:55:28 wyzesense2mqtt python3[3233]: Non-state event data: [2020-07-17 23:55:26][77A5E27A]RawEvent: type=raw_AB, data=b'02050603080a060302000500073500'
Jul 17 23:55:28 wyzesense2mqtt python3[3233]: sensor_mac: 77A5E27A
Jul 17 23:55:28 wyzesense2mqtt python3[3233]: Non-state event data: [2020-07-17 23:55:26][77A5E27A]RawEvent: type=raw_AB, data=b'02050603080a060302000500073500'
Jul 17 23:55:40 wyzesense2mqtt python3[3233]: [TRACE] Pass TimeoutError.
Jul 17 23:55:40 wyzesense2mqtt python3[3233]: Connected to MQTT: return code 0

I'll try to dig into WYZESENSE_DONGLE.List() and see if I can find anything else.

UPDATE 2: After doing some more digging and figuring out a way to output to the log from within wyzesense.py, I discovered that my bridge has 26 sensors pair to it. However, the 26th sensor always hangs, and sometimes the 25th sensor hangs. If I reduce the count in _GetSensors by one (count = resp.Payload[0] - 1), it still sometimes hangs. If I reduce it by two (count = resp.Payload[0] - 2), it never hangs and connects to the MQTT broker quickly.

Jul 18 00:47:58 wyzesense2mqtt systemd[1]: Starting wyzesense2mqtt...
Jul 18 00:48:08 wyzesense2mqtt systemd[1]: Started wyzesense2mqtt.
Jul 18 00:48:08 wyzesense2mqtt python3[3695]: Logging initialized...
Jul 18 00:48:08 wyzesense2mqtt python3[3695]: Reading configuration...
Jul 18 00:48:08 wyzesense2mqtt python3[3695]: Connecting to MQTT host XXXXXXXX
Jul 18 00:48:08 wyzesense2mqtt python3[3695]: Connecting to dongle /dev/hidraw0
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: LOG: time=2020-07-18T00:47:55.995000, data=b'14ff'
Jul 18 00:48:09 wyzesense2mqtt python3[3695]:   MAC: 77865404,  VER: 0.0.0.30 V1.4 Dongle UD3U,  ENR: b'J\xfb\xc5\xa1\r\xcf\x86D$6\xd3\xae\x17`&\xd5'
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Reading sensors configuration...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] File exists, start reading...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Finished reading sensors config.
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Loop over SENSORS to add invert_state...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Current sensor: 777EDD2F
...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Current sensor: 77A5FCF6
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Get list from dongle...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Start: Dongle.List()
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Calling: Dongle._GetSensors()
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Start GetSensors...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: 24 sensors reported, waiting for each one to report...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Sensor 1/24, MAC:77800DFB
...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Sensor 24/24, MAC:77A5E27A
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] return ctx.sensors
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Called: Dongle._GetSensors(), looping sensors...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Sensor 25/24, MAC:77A5ECA3
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Sensor found: 77800DFB
...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Sensor found: 77A5ECA3
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: [TRACE] Returning sensors
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Linked sensors: ['77800DFB', ..., '77A5ECA3']
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: sensor_mac: 77800DFB
...
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: sensor_mac: 77A5ECA3
Jul 18 00:48:09 wyzesense2mqtt python3[3695]: Connected to MQTT: return code 0

It appears that the last one or two sensors that are paired to my bridge are causing the hangup, but I still don't know why. I suppose I can plug it into a camera, upgrade it, and then pair everything back and see if that eliminates the issue.

dale3h commented 4 years ago

I am no longer having this issue after discovering that the 26th sensor was one with an invalid MAC address. I was NOT able to remove the sensor by publishing 00000000 to wyzesense2mqtt/remove, but trying to re-pair the sensor using wyzesense2mqtt/scan "kind of" worked. I say "kind of" because it crashed wyzesense2mqtt altogether, and required a restart of the service (which threw the NameError: name 'SENSORS' is not defined [see #17] the first couple of times it attempted to start). Here is the log output from attempting to pair the invalid sensor:

Jul 18 02:51:59 wyzesense2mqtt python3[5183]: Scan result: ('\x00\x00\x00\x00\x00\x00\x00\x00', 1, 16)
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: sensor_mac:
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: Unpairing bad MAC:
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: LOG: time=2020-07-18T02:51:31.981000, data=b'250000000000000000'
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: LOG: time=2020-07-18T02:51:31.981000, data=b'250000000000000000'
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: Clearing sensor topics
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: Invalid sensor found:
Jul 18 02:51:59 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:00 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:00 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:01 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:01 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:02 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:02 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:03 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:03 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
...this continued every second until it was restarted...
Jul 18 02:52:29 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:29 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:30 wyzesense2mqtt python3[5183]: Connected to MQTT: return code 0
Jul 18 02:52:30 wyzesense2mqtt python3[5183]: Disconnected from MQTT: return code 1
Jul 18 02:52:30 wyzesense2mqtt systemd[1]: Stopping wyzesense2mqtt...
Jul 18 02:52:30 wyzesense2mqtt systemd[1]: wyzesense2mqtt.service: Main process exited, code=killed, status=15/TERM
Jul 18 02:52:30 wyzesense2mqtt systemd[1]: wyzesense2mqtt.service: Succeeded.
Jul 18 02:52:30 wyzesense2mqtt systemd[1]: Stopped wyzesense2mqtt.

tl;dr: Delay was caused by a sensor that was previously paired with an invalid MAC. Remove the invalid sensor by pairing it again. Restart wyzesense2mqtt.

raetha commented 4 years ago

@dale3h Nice find. I've been trying to add more error handling for the invalid MAC issue, including auto deletion if it tries to pair one in the first place, however I don't think there is a good way to detect if one already exists. The CLI tool does have a fix function to unpair that way.

What would be interesting to see is if you know which sensor that is, and it still has a low battery and tries to pair with the bad MAC, whether or not the auto-unpair works. If you do have that scenario still, can you try doing a scan and pairing it and see what the logs give you? Or maybe that is what you did, in which case just confirm. Then we just need to solve the fact that it nuked your service rather than gracefully deleting and moving on.