nugget / python-insteonplm

Python 3 asyncio module for interfacing with Insteon Powerline modems
MIT License
32 stars 19 forks source link

Memory leak when receiving duplicate device responses #201

Closed mstovenour closed 4 years ago

mstovenour commented 4 years ago

I've been working on a thorough solution for #114. In the process I discovered that duplicate device responses all end up queued in Device._directACK_received_queue but only one message is removed from the queue by Device._wait_for_direct_ACK(). The way the code operates the Device._directACK_received_queue should never have more that one message in it. Yet that queue grows over time with every duplicate message received. It also means that each time Device._wait_for_direct_ACK() runs, it pulls old, duplicate messages out of the queue. This was my issue. However this is a long term problem for anyone because the message queue for each device will grow unbounded if the system runs for a long time. My home is large and I have 20 devices so I get a few duplicate messages for the devices that are further away from the PLM.

I think this is relatively easy to fix by beefing up Device._is_duplicate() however when I look at that function I can't really understand the motivation behind its implementation. A beefier solution is to directly compare each component of the saved messages (target, flags, cmd1, cmd2, extended data) while ignoring the hop counts (lower nibble of flags) when deciding if the current message is a duplicate. I'm concerned about changing that method though because I don't really understand why it only compares cmd1 and group and why it only works for ALL-LinkCleanup and ALL-LinkBroadcast messages. @teharris1 could you help me understand the motivation?

Below is more proof of the issue: I added a bit of extra logging in the process of understanding why my messages were not being processed correctly. You can checkout the fix_issue_201 branch in my repository to run this locally. I use the interactive tool and run the on_off_test or newly added test_device function. I run it multiple times until I see the log Got Direct ACK message. Already in queue: 1 or any number > 0. I also added logging for the message object IDs so it is easier to track which message instance is being processed out of the queue. Below is an edited logging output. I removed a lot of the irrelevant lines so it would be easier to see the actual message queuing.

`INFO:insteonplm.tools:Send on request INFO:insteonplm.tools:---------------------- DEBUG:insteonplm.devices:Starting 35.3E.ED Device._send_msg: Queuing message DEBUG:insteonplm.plm:TX: 3059110124:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0xNone} DEBUG:insteonplm.plm:RX: 3059071596:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0x06} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Message ACK 3059071596 DEBUG:insteonplm.plm:RX: 3059071372:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Direct ACK message. Already in queue: 0, Queueing 3059071372:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} DEBUG:insteonplm.plm:Processing _acknak_queue acknak: 3059071596:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0x06} msg: 3059110124:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0xNone} DEBUG:insteonplm.plm:ACK or NAK received DEBUG:insteonplm.devices:Starting Device._wait_for_direct_ACK DEBUG:insteonplm.devices:Remaining queue 0, Direct ACK: 3059071372:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} INFO:insteonplm.tools:Device 35.3E.ED state 0x1 value is changed to 255

INFO:insteonplm.tools:Send off request INFO:insteonplm.tools:---------------------- DEBUG:insteonplm.devices:Starting 35.3E.ED Device._send_msg: Queuing message DEBUG:insteonplm.plm:TX: 3059119819:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0xNone} DEBUG:insteonplm.plm:RX: 3059107180:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0x06} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Message ACK 3059107180 DEBUG:insteonplm.devices:Look for direct ACK DEBUG:insteonplm.plm:RX: 3059168559:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Direct ACK message. Already in queue: 0, Queueing 3059168559:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.plm:RX: 3059097356:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Direct ACK message. Already in queue: 1, Queueing 3059097356:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.plm:Processing _acknak_queue acknak: 3059107180:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0x06} msg: 3059119819:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0xNone} DEBUG:insteonplm.plm:ACK or NAK received DEBUG:insteonplm.devices:Starting Device._wait_for_direct_ACK DEBUG:insteonplm.devices:Remaining queue 1, Direct ACK: 3059168559:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} INFO:insteonplm.tools:Device 35.3E.ED state 0x1 value is changed to 0

INFO:insteonplm.tools:Send on request INFO:insteonplm.tools:---------------------- DEBUG:insteonplm.devices:Starting 35.3E.ED Device._send_msg: Queuing message DEBUG:insteonplm.plm:TX: 3059085868:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0xNone} DEBUG:insteonplm.plm:RX: 3059069420:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0x06} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Message ACK 3059069420 DEBUG:insteonplm.devices:Look for direct ACK DEBUG:insteonplm.plm:RX: 3059069100:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Direct ACK message. Already in queue: 1, Queueing 3059069100:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} DEBUG:insteonplm.plm:Processing _acknak_queue acknak: 3059069420:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0x06} msg: 3059085868:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x11, 'cmd2': 0xff, 'acknak': 0xNone} DEBUG:insteonplm.plm:ACK or NAK received DEBUG:insteonplm.devices:Starting Device._wait_for_direct_ACK DEBUG:insteonplm.devices:Remaining queue 1, Direct ACK: 3059097356:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} INFO:insteonplm.tools:Device 35.3E.ED state 0x1 value is changed to 255

INFO:insteonplm.tools:Send off request INFO:insteonplm.tools:---------------------- DEBUG:insteonplm.devices:Starting 35.3E.ED Device._send_msg: Queuing message DEBUG:insteonplm.plm:TX: 3059118561:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0xNone} DEBUG:insteonplm.plm:RX: 3059071500:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0x06} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Message ACK 3059071500 DEBUG:insteonplm.devices:Look for direct ACK DEBUG:insteonplm.plm:RX: 3059106757:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.devices:Starting Device.receive_message for 35.3E.ED DEBUG:insteonplm.devices:Got Direct ACK message. Already in queue: 1, Queueing 3059106757:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x13, 'cmd2': 0x00} DEBUG:insteonplm.plm:Processing _acknak_queue acknak: 3059071500:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0x06} msg: 3059118561:{'code': 0x62, 'address': 35.3E.ED, 'flags': 0x00, 'cmd1': 0x13, 'cmd2': 0x00, 'acknak': 0xNone} DEBUG:insteonplm.plm:ACK or NAK received DEBUG:insteonplm.devices:Starting Device._wait_for_direct_ACK DEBUG:insteonplm.devices:Remaining queue 1, Direct ACK: 3059069100:{'code': 0x50, 'address': 35.3E.ED, 'target': 1E.DC.30, 'flags': 0x20, 'cmd1': 0x11, 'cmd2': 0xff} INFO:insteonplm.tools:Device 35.3E.ED state 0x1 value is changed to 0 `

teharris1 commented 4 years ago

The _is_duplicate method checks broadcast and cleanup messages to deal with the specific behavior of devices acting as a controller such as when a switch is manually pressed. A controller sends a broadcast message such as on or off which is intended to be heard by any responder which has the controller and group in the responder's ALDB, including the modem as a responder. The controller then sends cleanup messages to each device that it has as responders to the group in the controller's ALDB. The responder device is supposed to use the cleanup message as the trigger if it does not receive the broadcast. That is a failsafe method of Insteon. However, if the broadcast message was received and the cleanup is then received the responder can act on the command twice or more if multiple clean up messages are received. The device actually sends multiple cleanup messages if it does not receive a cleanup ack. This library does not send back the cleanup ack from the modem to the controller. Even if it did the deduplicate function would be needed to only respond to the original broadcast or the first cleanup it sees.

Because this is a "normal" part of Insteon and happens ever time a switch is turned on or off it was required as part of the core library. The issue you are seeing is probably an issue with the message hops and rebroadcasting direct acks. This is a great catch and appreciate the help.

You are right that a more robust method would be to check all messages and use the hops field to do the comparison. Additionally, it would be good if the library send a cleanup ack back to the controller form the modem.

After re-reading this I now see the point of your PR and will bring it into the master branch.

teharris1 commented 4 years ago

The _is_duplicate method checks broadcast and cleanup messages to deal with the specific behavior of devices acting as a controller such as when a switch is manually pressed. A controller sends a broadcast message such as on or off which is intended to be heard by any responder which has the controller and group in the responder's ALDB, including the modem as a responder. The controller then sends cleanup messages to each device that it has as responders to the group in the controller's ALDB. The responder device is supposed to use the cleanup message as the trigger if it does not receive the broadcast. That is a failsafe method of Insteon. However, if the broadcast message was received and the cleanup is then received the responder can act on the command twice or more if multiple clean up messages are received. The device actually sends multiple cleanup messages if it does not receive a cleanup ack. This library does not send back the cleanup ack from the modem to the controller. Even if it did the deduplicate function would be needed to only respond to the original broadcast or the first cleanup it sees.

Because this is a "normal" part of Insteon and happens ever time a switch is turned on or off it was required as part of the core library. The issue you are seeing is probably an issue with the message hops and rebroadcasting direct acks. This is a great catch and appreciates the help.

You are right that a more robust method would be to check all messages and use the hops field to do the comparison. Additionally, it would be good if the library sends a cleanup ack back to the controller form the modem.

After re-reading this I now see the point of your PR and will bring it into the master branch.