nugget / python-insteonplm

Python 3 asyncio module for interfacing with Insteon Powerline modems
MIT License
32 stars 19 forks source link

potentially serious device state errors from off-by-one message processing #215

Closed rpdrewes closed 4 years ago

rpdrewes commented 4 years ago

I have found what seems to be a serious problem with message processing in python-insteonplm that can lead to incorrect device state information. I see the problem with my 2450 I/O Linc but I think the problem could occur in many contexts with python-insteonplm and other Insteon devices. I'll describe it in my specific situation here. To make sure that I have the current state of the sensor, I issue an async_refresh_state() periodically (every minute say). This causes two queries to be sent out on the two "states" for the device, the relay and the sensor. Normally this looks like this:

2020-01-11 01:42:09 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-11 01:42:10 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-11 01:42:10 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x03, 'cmd2': 0x00}
2020-01-11 01:42:10 INFO     Device 39a096 state 1 value is changed to 0
2020-01-11 01:42:12 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-11 01:42:13 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-11 01:42:13 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x03, 'cmd2': 0x01}
2020-01-11 01:42:13 INFO     Device 39a096 state 2 value is changed to 1

What happens in the above is that the PLM sends out a request for state 1 (the relay). The request is acknowledged by the I/O Linc (address 39.A0.96), and then the I/O Linc sends the requested data back (0x00). This sequence of TX, RX, RX is then repeated for state 2 (the sensor). Here the data returned is 0x01.

The problem occurs when, for an unknown reason, the I/O Linc inserts another unrelated response in the middle of this exchange. The python-insteonplm library apparently does not recognize that the message coming back is not the requested information but interprets it as the state information anyway. From that point on we have an off-by-one situation, where the state 2 data coming from the I/O Linc is being interpreted as the state 1 data, and then the state 1 data from the I/O Linc is being interpreted as the state 2 data, for a while. This situation typically persists for some number of cycles (about 5 minutes, typically) then resolves and does not reappear for hours (often 1-12 hours). Here is what the exchange looks like as the problem occurs. I have marked the mysterious data from the I/O Linc with *** at the end of the line:

2020-01-11 03:05:52 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-11 03:05:53 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-11 03:05:53 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x03, 'cmd2': 0x01}
2020-01-11 03:05:53 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x21, 'cmd1': 0xfc, 'cmd2': 0xc1}  ***
2020-01-11 03:05:53 INFO     Device 39a096 state 2 value is changed to 1
2020-01-11 03:06:04 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-11 03:06:05 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-11 03:06:05 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x03, 'cmd2': 0x00}
2020-01-11 03:06:05 INFO     Device 39a096 state 1 value is changed to 255
2020-01-11 03:06:07 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-11 03:06:08 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-11 03:06:08 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x03, 'cmd2': 0x01}
2020-01-11 03:06:08 INFO     Device 39a096 state 2 value is changed to 0

It seems that the extra data from the I/O Linc (cmd1 0xfc, cmd2 0xc1) is interpreted as a "255" for state 1. This is the relay, so I think any nonzero value is just said to be 255. Then the next data in the queue from the I/O Linc is interpreted as the response for the state 2 query. Both values are wrong: off-by-one.

What is this mystery data coming from the I/O Linc? I don't know. The Insteon documentation I have doesn't define that command type except to say it is reserved. This situation above happens several times per day on average. The timing and frequency seem to be random.

I am concerned that this problem could occur in many contexts within the python-insteonplm library, where there are messages out there in the queue coming in unexpectedly from devices, and those responses are not positively linked to the query that preceded them.

How could this be fixed in a general way?

rpdrewes commented 4 years ago

Another example. My comments in caps. This time the mysterious interposed message from the iolinc is cmd1 0xf7 instead of 0xfc:

2020-01-13 14:43:07 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-13 14:43:08 DEBUG    Messages in queue: 2
2020-01-13 14:43:08 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-13 14:43:08 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x00, 'cmd2': 0x00}
2020-01-13 14:43:08 INFO     Device 39a096 state 1 value is changed to 0
2020-01-13 14:43:10 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-13 14:43:11 DEBUG    Messages in queue: 3
2020-01-13 14:43:11 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-13 14:43:11 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x26, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-13 14:43:11 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x21, 'cmd1': 0xf7, 'cmd2': 0xc1}
GARBAGE RECORD ABOVE. cmd 0xf7, 0xc1. different than the cmd1 0xfc, 0xc1 reported on github
2020-01-13 14:43:11 INFO     Device 39a096 state 2 value is changed to 1
2020-01-13 14:43:42 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-13 14:43:43 DEBUG    Messages in queue: 2
2020-01-13 14:43:43 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-13 14:43:43 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x20, 'cmd1': 0x00, 'cmd2': 0x00}
NOTE: IN FULL LOG WE ACK BACK? THE BAD 0xf7 MESSAGE AROUND HERE
2020-01-13 14:43:43 INFO     Device 39a096 state 1 value is changed to 255
2020-01-13 14:43:45 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-13 14:43:46 DEBUG    Messages in queue: 2
2020-01-13 14:43:46 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-13 14:43:46 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x25, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-13 14:43:46 INFO     Device 39a096 state 2 value is changed to 0            PROBLEM HERE
2020-01-13 14:44:17 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-13 14:44:18 DEBUG    Messages in queue: 2
2020-01-13 14:44:18 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-13 14:44:18 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x20, 'cmd1': 0x00, 'cmd2': 0x00}
2020-01-13 14:44:18 INFO     Device 39a096 state 1 value is changed to 255
2020-01-13 14:44:20 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-13 14:44:21 DEBUG    Messages in queue: 1
2020-01-13 14:44:21 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-13 14:44:21 INFO     Device 39a096 state 2 value is changed to 0            PROBLEM HERE
2020-01-13 14:44:22 DEBUG    Messages in queue: 1
2020-01-13 14:44:22 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x2f, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-13 14:44:52 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0xNone}
2020-01-13 14:44:53 DEBUG    Messages in queue: 2
2020-01-13 14:44:53 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x00, 'acknak': 0x06}
2020-01-13 14:44:53 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x25, 'cmd1': 0x00, 'cmd2': 0x00}
2020-01-13 14:44:53 INFO     Device 39a096 state 1 value is changed to 0
2020-01-13 14:44:55 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-13 14:44:56 DEBUG    Messages in queue: 1
2020-01-13 14:44:56 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-13 14:44:57 DEBUG    Messages in queue: 1
2020-01-13 14:44:57 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x2f, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-13 14:44:57 INFO     Device 39a096 state 2 value is changed to 1
teharris1 commented 4 years ago

A couple of clarifications to be sure we both understand the technical detail:

  1. The 0x19 command sent to a device results in a DIRECT ACK response but the CMD1 and CMD2 values are not related to the command table. The CMD1 value is the version of the ALDB and CMD2 value is the state of the group. So a CMD1 value of 0xfc is not a command but a value (albeit one that does not make sense yet.)
  2. The flag 0x2z (where z is any value from 0 to F) is a DIRECT ACK and is supposed to only happen when a device is responding to a direct command from another device (usually the modem).

So the dual DIRECT ACK is a function of the device (albeit not a proper function). I don't know why the device would be sending two responses to the same direct message. I am wondering if this is, in fact, a wiring issue with the second message being a repeater but getting garbled The fact that the first flag is 0x26 and the second is 0x21 may be an indication of that.

This is all just speculation on the behavior of your IOLinc but the issue here is the possibility of two DIRECT ACK messages being sent incorrectly. In theory, this should never happen, but it is.

There may be a way to address this in code but I am not sure. I will have to look at it.

teharris1 commented 4 years ago

By the way, you may want to reset your IOLInc to the factory defaults and set up the links again. Also, if you have set up the links with the Modem correctly, you should not need to poll. The library keeps the values of the device up to date as they change.

rpdrewes commented 4 years ago

Thanks so much for you response! It will take me some time to think about what you have said. It's good to know you are still involved in this project. I have some other trace examples that may be illuminating.

For what it's worth, the I/O Linc was factory reset shortly before these tests (less than a month before the test). I have a suspicion that the problems may be related to I/O Linc firmware bugs that were fixed in later revisions, because in my 15 years of frustration with Insteon devices I have found Insteon devices pretty flaky and unreliable but gradually improving over time.

I added a test before sending a device-level refresh to not send the refresh if there was already anything in the directACK receive queue, and also a test in the directACK receive to not queue any message with cmd1 in the range 0xf0-0xff, and that seems to have improved things significantly. But there are even weirder problems that I have captured. For example, out of the blue, the I/O Linc will sometimes report a bad state on the sensor even if I am only querying one state (the sensor) and no other extraneous intervening messages have arrived from the I/O Linc in the meantime to confuse things. This is with the sensor wires shorted so there is no chance of electrical issues; the I/O Linc just sometimes, very rarely (once every day or two on average), seems to send the wrong sate information.

I also tried to study Misterhouse Perl code to see if they have a different way of doing things. But Perl is such an abomination that I can't really be sure of anything.

teharris1 commented 4 years ago

Is it always the second message that is "wrong" or is it sometimes the first message? If it is always the second message I can flush the buffer before sending the next command (probably a good idea anyway). There is logic in the code that is supposed to catch a DIRECT ACK that is not expected but this use case was not expected.

rpdrewes commented 4 years ago

To improve reliability, as a mitigation I started only querying one state (state 2, the sensor) rather than doing a device-level refresh. This does improve reliability, but does not fully fix the problem. So one answer to your question is that the problem occurs even when there aren't two queries, just the one to the sensor. There may be multiple problems here though, one simply the firmware on the I/O Linc being terrible and reporting a bad value because of some internal screwup, and one attributable to the strange 0xf0-0xff directACK message causing an off-by-one error.

But when I do issue device-level queries, so the device refresh issue queries on both states, yes, both states can get reported wrong for significant intervals when the off-by-one error has started in my use case where the relay is in one state and the sensor is in another. In my case, the relay is open and the sensor is shorted (by a wire in my test setup). If people have a different setup where (for example) the relay is typically open and the sensor is also typically open, then the off-by-one error may occur and persist for a while and never be noticed since off-by-one 0 0 state is still 0 0!

teharris1 commented 4 years ago

My question was not clear enough. What I meant was when you see two directACK messages to any status query, is the first directACK always the correct value in CMD2 and the second one always wrong value in CMD2 or can the first response be wrong and the second one correct?

rpdrewes commented 4 years ago

The short answer is that on a multi-state "device-level" refresh I have seen either state wrong. I have only tested two-state devices (my I/O Linc).

Here is another issue I see fairly often. Right now I only issue single-state refresh queries (not device-level refreshes that result in multiple state queries). In the log below, the I/O Linc has correctly reported sensor state 1 value of 0x01 for several hours prior to the start of the excerpt. After the first message in the log below, about 30 seconds pass with no activity, which is my polling interval, then we again issue a status request ONLY on state 1 (the sensor). We get a response with the correct state value (0x01). We then immediately get a second (extended?) directACK with cmd1 of 0xff and a bunch of userdata that I can't interpret. It looks like we then get a third directACK, which contains the incorrect state value of 0x00, and it is that value that is interpreted as the status response and stored as the state.

After this one incorrect reading, we go back to reading the correct state 1 value of 0x01 for another long while (not included in the log below for brevity).

What is all this other directACK traffic?!?

2020-01-19 23:46:35 DEBUG    Ending Device._wait_for_direct_ACK
2020-01-19 23:47:04 DEBUG    Starting Device._send_msg
2020-01-19 23:47:04 DEBUG    msg is: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-19 23:47:04 DEBUG    Ending Device._send_msg
2020-01-19 23:47:04 DEBUG    Starting 39.A0.96 Device._process_send_queue
2020-01-19 23:47:04 DEBUG    Lock is locked from yield from
2020-01-19 23:47:04 DEBUG    Queueing msg: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-19 23:47:04 DEBUG    Ending 39.A0.96 Device._process_send_queue
2020-01-19 23:47:04 DEBUG    TX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0xNone}
2020-01-19 23:47:04 DEBUG    Waiting for ACK or NAK message
2020-01-19 23:47:05 DEBUG    Starting: data_received
2020-01-19 23:47:05 DEBUG    Received 45 bytes from PLM: b'026239a09600190106025039a09653bf8f210001025139a09653bf8f1affc1e77046c5594c81c32a0db0ab540c'
2020-01-19 23:47:05 DEBUG    Finishing: data_received
2020-01-19 23:47:05 DEBUG    Total buffer: b'026239a09600190106025039a09653bf8f210001025139a09653bf8f1affc1e77046c5594c81c32a0db0ab540c'
2020-01-19 23:47:05 DEBUG    Total buffer: b'025039a09653bf8f210001025139a09653bf8f1affc1e77046c5594c81c32a0db0ab540c'
2020-01-19 23:47:05 DEBUG    Total buffer: b'025139a09653bf8f1affc1e77046c5594c81c32a0db0ab540c'
2020-01-19 23:47:05 DEBUG    Buffer too short to have a message
2020-01-19 23:47:05 DEBUG    Messages in queue: 3
2020-01-19 23:47:05 DEBUG    RX: {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-19 23:47:05 DEBUG    Starting Device.receive_message {'code': 0x62, 'address': 39.A0.96, 'flags': 0x00, 'cmd1': 0x19, 'cmd2': 0x01, 'acknak': 0x06}
2020-01-19 23:47:05 DEBUG    Got Message ACK
2020-01-19 23:47:05 DEBUG    Look for direct ACK
2020-01-19 23:47:05 DEBUG    Ending Device.receive_message
2020-01-19 23:47:05 DEBUG    RX: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x21, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-19 23:47:05 DEBUG    Starting Device.receive_message {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x21, 'cmd1': 0x00, 'cmd2': 0x01}
2020-01-19 23:47:05 DEBUG    Got Direct ACK message
2020-01-19 23:47:05 DEBUG    Ending Device.receive_message
2020-01-19 23:47:05 DEBUG    RX: {'code': 0x51, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x1a, 'cmd1': 0xff, 'cmd2': 0xc1, 'userdata': 0xe7.70.46.c5.59.4c.81.c3.2a.0d.b0.ab.54.0c}
2020-01-19 23:47:05 DEBUG    Starting Device.receive_message {'code': 0x51, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x1a, 'cmd1': 0xff, 'cmd2': 0xc1, 'userdata': 0xe7.70.46.c5.59.4c.81.c3.2a.0d.b0.ab.54.0c}
2020-01-19 23:47:05 DEBUG    Ending Device.receive_message
2020-01-19 23:47:05 DEBUG    Last item in self._recv_queue reached.
2020-01-19 23:47:05 DEBUG    ACK or NAK received
2020-01-19 23:47:05 DEBUG    Starting Device._wait_for_direct_ACK
2020-01-19 23:47:05 DEBUG    Direct ACK: {'code': 0x50, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x2a, 'cmd1': 0x00, 'cmd2': 0x00}
2020-01-19 23:47:05 DEBUG    Releasing lock after processing direct ACK
2020-01-19 23:47:05 DEBUG    Device 39.A0.96 msg_lock unlocked
2020-01-19 23:47:05 DEBUG    _status_message_received in IoLincSensor class calling update_subscribers with 0x00
2020-01-19 23:47:05 INFO     Device 39a096 state 2 value is changed to 0
teharris1 commented 4 years ago

What is your environment?

teharris1 commented 4 years ago

Are you using a HUB or a PLM? If a Hub, is it version 1 or version 2? Are you using a raspberry Pi?

teharris1 commented 4 years ago

I can see part of the issue is on the most recent dump. There is a "race" condition where the first directACK is coming in before the line that says:

2020-01-19 23:47:05 DEBUG Starting Device._wait_for_direct_ACK

This allows the second directACK to be treated as the response to the prior status request. I cannot tell from other posts if that is the situation. This race condition happens from time to time but is usually not an issue since the second direct ack is not there.

Also in the most recent post, there is a command in between the first direct ack and the second one that reads:

2020-01-19 23:47:05 DEBUG RX: {'code': 0x51, 'address': 39.A0.96, 'target': 53.BF.8F, 'flags': 0x1a, 'cmd1': 0xff, 'cmd2': 0xc1, 'userdata': 0xe7.70.46.c5.59.4c.81.c3.2a.0d.b0.ab.54.0c}

This is a very odd message because it is a direct message. This type of message only happen when an ALDB is being loaded or extended properties are read (but this is not any of those message types for sure.) The cmd1: 0xff is no known command so it is impossible to tell what that is.

Bottom line is this device is doing some very odd stuff and this may not be possible to figure out.

rpdrewes commented 4 years ago

I am using a PLM. No hub. My PLM is very recent, but the I/O Linc is rather old.

This is running off a regular computer, quite a powerful one (SuperMicro i7) but it is doing other things too like functioning as a NAS and a router but it does not seem overloaded. I have seen the same behavior on other (even faster) machines I have used in the past that are very lightly loaded.

I'm going to see if I can capture any errors like this involving other devices, like Switch Lincs.

teharris1 commented 4 years ago

@rpdrewes I have made an update to insteonplm that will ignore the second direct ack. This is the only thing I can do. If you see other devices behaving similarly let me know but for now this behavior is undocumented, unexpected and unpredictable. I cannot see how any program can address that unless we find some pattern.