robertmuth / PyZwaver

Z-Wave library written in Python3
GNU General Public License v3.0
38 stars 8 forks source link

PyZwaver becomes unresponsive / messages timing out #16

Open gerein opened 4 years ago

gerein commented 4 years ago

Since commit 61be189adb8dba00e398147bd7eddf29b1f22b15 (inflight handling), PyZwaver ends up in an unstable state after a while of running for me. Messages just time-out and the only solution is to kill the process with "-9" and restart. See log-excerpt below (my code only sends SwitchBinary_Get/Set and ThermostatMode_Get/Set, rest comes from PyZwaver):


2020-01-05 09:01:41,137 | ERROR    | message timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:10 03 Association_Get:85 X:02 ff xmit:25 cb:fa chk:52
2020-01-05 09:01:41,138 | ERROR    | Timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:10 03 Association_Get:85 X:02 ff xmit:25 cb:fa chk:52
2020-01-05 09:01:41,221 | INFO     | sent: SOF len:0a REQU API_ZW_SEND_DATA:13 node:10 03 AssociationGroupInformation_NameGet:59 X:01 ff xmit:25 cb:fb chk:8c
2020-01-05 09:01:42,140 | ERROR    | message timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:10 03 AssociationGroupInformation_NameGet:59 X:01 ff xmit:25 cb:fb chk:8c
2020-01-05 09:01:42,147 | ERROR    | Timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:10 03 AssociationGroupInformation_NameGet:59 X:01 ff xmit:25 cb:fb chk:8c
2020-01-05 09:01:42,229 | INFO     | sent: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_ListGet:59 X:05 00 ff xmit:25 cb:fc chk:89
2020-01-05 09:01:43,149 | ERROR    | message timeout: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_ListGet:59 X:05 00 ff xmit:25 cb:fc chk:89
2020-01-05 09:01:43,151 | ERROR    | Timeout: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_ListGet:59 X:05 00 ff xmit:25 cb:fc chk:89
2020-01-05 09:01:43,233 | INFO     | sent: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_InfoGet:59 X:03 00 ff xmit:25 cb:fd chk:8e
2020-01-05 09:01:44,153 | ERROR    | message timeout: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_InfoGet:59 X:03 00 ff xmit:25 cb:fd chk:8e
2020-01-05 09:01:44,154 | ERROR    | Timeout: SOF len:0b REQU API_ZW_SEND_DATA:13 node:10 04 AssociationGroupInformation_InfoGet:59 X:03 00 ff xmit:25 cb:fd chk:8e
2020-01-05 09:02:59,988 | INFO     | sent: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 01 25 01 ff xmit:25 cb:07 chk:7e
2020-01-05 09:03:00,898 | ERROR    | message timeout: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 01 25 01 ff xmit:25 cb:07 chk:7e
2020-01-05 09:03:00,899 | ERROR    | Timeout: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 01 25 01 ff xmit:25 cb:07 chk:7e
2020-01-05 09:03:02,002 | INFO     | sent: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 01 25 02 xmit:25 cb:08 chk:8f
2020-01-05 09:03:02,912 | ERROR    | message timeout: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 01 25 02 xmit:25 cb:08 chk:8f
2020-01-05 09:03:02,913 | ERROR    | Timeout: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 01 25 02 xmit:25 cb:08 chk:8f
2020-01-05 09:03:12,051 | INFO     | sent: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0f 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:09 chk:e2
2020-01-05 09:03:12,970 | ERROR    | message timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0f 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:09 chk:e2
2020-01-05 09:03:12,972 | ERROR    | Timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0f 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:09 chk:e2
2020-01-05 09:03:14,064 | INFO     | sent: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0f 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0a chk:e0
2020-01-05 09:03:14,984 | ERROR    | message timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0f 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0a chk:e0
2020-01-05 09:03:14,985 | ERROR    | Timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0f 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0a chk:e0
2020-01-05 09:03:16,068 | INFO     | sent: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0d 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:0b chk:e2
2020-01-05 09:03:16,988 | ERROR    | message timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0d 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:0b chk:e2
2020-01-05 09:03:16,989 | ERROR    | Timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0d 03 SwitchBinary_Set:25 X:01 00 xmit:25 cb:0b chk:e2
2020-01-05 09:03:18,071 | INFO     | sent: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0d 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0c chk:e4
2020-01-05 09:03:18,991 | ERROR    | message timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0d 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0c chk:e4
2020-01-05 09:03:18,992 | ERROR    | Timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0d 02 SwitchBinary_Get:25 X:02 xmit:25 cb:0c chk:e4
2020-01-05 09:03:19,085 | INFO     | sent: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 02 25 01 ff xmit:25 cb:0d chk:77
2020-01-05 09:03:20,003 | ERROR    | message timeout: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 02 25 01 ff xmit:25 cb:0d chk:77
2020-01-05 09:03:20,004 | ERROR    | Timeout: SOF len:0e REQU API_ZW_SEND_DATA:13 node:0e 07 MultiChannel_CmdEncap:60 X:0d 00 02 25 01 ff xmit:25 cb:0d chk:77
2020-01-05 09:03:20,097 | INFO     | sent: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 02 25 02 xmit:25 cb:0e chk:8a
2020-01-05 09:03:21,007 | ERROR    | message timeout: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 02 25 02 xmit:25 cb:0e chk:8a
2020-01-05 09:03:21,008 | ERROR    | Timeout: SOF len:0d REQU API_ZW_SEND_DATA:13 node:0e 06 MultiChannel_CmdEncap:60 X:0d 00 02 25 02 xmit:25 cb:0e chk:8a
2020-01-05 09:03:24,124 | INFO     | sent: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0a 03 ThermostatMode_Set:40 X:01 00 xmit:25 cb:0f chk:84
2020-01-05 09:03:25,033 | ERROR    | message timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0a 03 ThermostatMode_Set:40 X:01 00 xmit:25 cb:0f chk:84
2020-01-05 09:03:25,035 | ERROR    | Timeout: SOF len:0a REQU API_ZW_SEND_DATA:13 node:0a 03 ThermostatMode_Set:40 X:01 00 xmit:25 cb:0f chk:84
2020-01-05 09:03:26,126 | INFO     | sent: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0a 02 ThermostatMode_Get:40 X:02 xmit:25 cb:10 chk:9a
2020-01-05 09:03:27,036 | ERROR    | message timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0a 02 ThermostatMode_Get:40 X:02 xmit:25 cb:10 chk:9a
2020-01-05 09:03:27,037 | ERROR    | Timeout: SOF len:09 REQU API_ZW_SEND_DATA:13 node:0a 02 ThermostatMode_Get:40 X:02 xmit:25 cb:10 chk:9a
robertmuth commented 4 years ago

How long does it take approximately before the system gets wonky? Do you have the logs from just before when the problem started?

Once the problems occurs are all devices affected?

The system has been running rock solid for me for over a week. Maybe I need to add one of those gratuitous "sleeps" between the sending commands to not overload the controller.

robertmuth commented 4 years ago

ping

gerein commented 4 years ago

Hi - FYI, since I couldn't solve this, I have largely rewritten the driver/zmessage/command/command_translator layer and changed the locking logic. Still some work to do and clean-ups pending but this is now running rock-solid stable for me.

robertmuth commented 4 years ago

Hi gerein,

I had a brief glance at your change and my first re-action was that it is a little bit too large to be reviewed in one go. Could you break this up in multiple focused PRs. For example there are a lot of cosmetic changes, like reformatting and variable renames, that should be separated. (I can review those very quickly.)

I like the table driven approach you chose, but comments are sparse. So it is really hard for me to match the code with the intention behind it. I would encourage you to also add few more comments, especially for files, abstractions (classes) and tricky code.

I also noticed some very deep class hierarchies for frames (4 levels) - can those be flattened?

gerein commented 4 years ago

Hi - sorry, this was not really meant as a patch, rather a rewrite of the lower layer. And yes, I've changed the coding style to help me understand the flow more easily.

I have since added documentation to the driver-class and refactored the frame class hierarchy. I have also moved the logic from command_translator into the node-class and changed the MultiChannel-support to support proper endpoints. Still some work to do to further simplify but getting there.

I did not adapt any of your examples or tests, so those are all broken in my branch at the moment.

Feel free to use or ignore. Any questions, let me know.