kk7ds / pynx584

Python library and server for talking to NX584/NX8E interfaces
GNU General Public License v3.0
29 stars 26 forks source link

Binary Protocol Implementation #70

Closed drsweb closed 2 years ago

drsweb commented 2 years ago

@kk7ds Here are some proposed updates to implement the binary version of the NX584 Home Automation protocol. This update not only enables the binary functionality for the NX584/NX8E, but also allows pynx584 to run using the NX590E/NX590NE since it supports the binary version of the automation protocol over TCP/IP.

Two new config file entries are used to support this, both in the existing "Config" section:

# Set To true if using binary protocol
# Defaults to False
use_binary_protocol = True

# Length of idle time before sending heartbeat keep alive
idle_time_heartbeat_seconds = 20

I added the idle_time_heartbeat_seconds for the NX590, it seems to have a short idle timeout/disconnect period.

I've tested using both Serial and TCP/IP connections, using an NX584 in both ASCII and binary mode and also an NX590 (it only supports binary mode over TCP/IP). I know this is a decent amount of code, and I don't know how much time you/others are still willing to put into this...but hopefully people will find this useful.

Thank you for all your great work on this amazing piece of software.

kk7ds commented 2 years ago

Thanks for doing this. I think it would be a lot better to encapsulate the binary-to-ascii reading routine into a single adapter module and re-use it for the serial and socket versions. Also, I think creating a parent wrapper class for Socket and Serial will make that easier/cleaner.

Also, it'd be best to separate out the idle heartbeat thing into a separate change.

Would you mind if I took what you have here and propose an alternate arrangement of your code per above? I'd need you to test against the binary side for me, and of course I'd keep it clear in the commit that you deserve the credit for the actual binary protocol stuff...

drsweb commented 2 years ago

You're welcome. I agree with what you are suggesting...when I was writing this, some of the code felt like it violated DRY so I think what you are proposing should definitely help with that. I would be glad to help with testing, as long as you can put up with me having to fit that into my regular work schedule.

Thanks!

kk7ds commented 2 years ago

Sure, it'll take me a bit to reswizzle things myself, but if you're willing I think it'd be best. I'll update this when I have something to play with. Maybe tomorrow if we're lucky :)

Thanks again!

kk7ds commented 2 years ago

Okay, something like this is what I'd propose. I let it run today on my (ASCII) system while working and it seems okay for my stuff. Not sure about the binary part of course, but I did a little synthetic testing and I think it's close to working for binary as well.

When you get a chance, it'd be great if you can test it and let me know what you think about the approach!

drsweb commented 2 years ago

Awesome! Yes, I will start testing later tonight and let you know how it goes. Nice work!

drsweb commented 2 years ago

Hi @kk7ds. I like what you've done with the code, I think it really cleans things up a lot. The separation of the connection from the protocol makes a lot of sense. Great work.

I am seeing this strange behavior where at times pynx584 doesn't seem to be receiving some messages sent by my control panel. I don't know if you've seen this too, or if it's just something with my setup...but using the same setup without this update I haven't seen this happen/been able to make this happen, it doesn't seem to miss any panel messages on previous versions.

I haven't tested ASCII yet, what I've done is using the binary protocol with an NX590 and also with an NX584 in binary mode. For tracing, here's my setup so that I can see what the panel is doing:

pynx584 <--> TCP/IP Proxy <--> NX590/NX Panel

Below is a sample of what I'm seeing. From the output coming from pynx584...here's a portion of my pynx584 log:

2022-06-03 14:15:45,525 _internal INFO 127.0.0.1 - - [03/Jun/2022 14:15:45] "GET /command?cmd=arm&type=exit&partition=1 HTTP/1.1" 200 -
2022-06-03 14:15:46,084 controller DEBUG Sending queued [62, 2, 1]
2022-06-03 14:15:46,532 controller DEBUG Received: 29 Positive Acknowledge (data [])
2022-06-03 14:15:46,633 controller DEBUG Received: 6 Partition Status (data [0, 104, 0, 192, 64, 98, 132, 130])
2022-06-03 14:15:46,635 controller DEBUG Partition 1 ['Pulsing buzzer', 'reserved', 'Armed', 'Exit 1', 'Exit 2', 'Sensor low battery', 'Ready to arm', 'Tone on (activation tone)', 'Open period', 'Delay trip in progress (common zone)']
2022-06-03 14:15:47,740 controller DEBUG Received: 6 Partition Status (data [0, 104, 0, 192, 64, 98, 4, 130])
2022-06-03 14:15:47,742 controller DEBUG Partition 1 ['Pulsing buzzer', 'reserved', 'Armed', 'Exit 1', 'Exit 2', 'Sensor low battery', 'Ready to arm', 'Open period', 'Delay trip in progress (common zone)']
2022-06-03 14:15:50,014 controller DEBUG Received: 7 Partitions Snapshot (data [71, 2, 0, 0, 0, 0, 0, 0])
2022-06-03 14:15:58,400 controller DEBUG Received: 6 Partition Status (data [0, 104, 0, 224, 64, 98, 4, 130])
2022-06-03 14:15:58,401 controller DEBUG Partition 1 ['Pulsing buzzer', 'reserved', 'Armed', 'Delay expiration warning', 'Exit 1', 'Exit 2', 'Sensor low battery', 'Ready to arm', 'Open period', 'Delay trip in progress (common zone)']
2022-06-03 14:16:18,868 controller INFO No activity for a while, heartbeating
2022-06-03 14:16:19,869 controller DEBUG Sending queued [40]
2022-06-03 14:16:38,951 controller INFO No activity for a while, heartbeating
2022-06-03 14:16:39,952 controller DEBUG Sending queued [40]
2022-06-03 14:16:44,390 controller DEBUG Received: 7 Partitions Snapshot (data [71, 2, 0, 0, 0, 0, 0, 0])
2022-06-03 14:16:45,513 controller DEBUG Received: 6 Partition Status (data [0, 96, 0, 0, 76, 98, 4, 130])
2022-06-03 14:16:45,514 controller DEBUG Partition 1 ['reserved', 'Armed', 'Recent closing being timed', 'Reserved', 'Sensor low battery', 'Ready to arm', 'Open period', 'Delay trip in progress (common zone)']
2022-06-03 14:16:46,639 controller DEBUG Received: 7 Partitions Snapshot (data [7, 2, 0, 0, 0, 0, 0, 0])

And here's the matching activity from the proxy perspective. Client is pynx584, server is the panel:

6/3/2022 2:15:46 PM : Client sent 7E 03 3E 02 01 44 CB  (HEX)
6/3/2022 2:15:46 PM : Client sent 126 3 62 2 1 68 203  (DEC)
6/3/2022 2:15:46 PM : Client sent 7 bytes to Server
---------------------------

6/3/2022 2:15:46 PM : Server sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:15:46 PM : Server sent 126 1 29 30 31  (DEC)
6/3/2022 2:15:46 PM : Server sent 5 bytes to Client
---------------------------

6/3/2022 2:15:46 PM : Server sent 7E 09 86 00 68 00 C0 40 62 84 82 62 67  (HEX)
6/3/2022 2:15:46 PM : Server sent 126 9 134 0 104 0 192 64 98 132 130 98 103  (D
EC)
6/3/2022 2:15:46 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:15:46 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:15:46 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:15:46 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:15:47 PM : Server sent 7E 09 86 00 68 00 C0 40 62 04 82 E1 66  (HEX)
6/3/2022 2:15:47 PM : Server sent 126 9 134 0 104 0 192 64 98 4 130 225 102  (DE
C)
6/3/2022 2:15:47 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:15:47 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:15:47 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:15:47 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:15:50 PM : Server sent 7E 09 87 47 02 00 00 00 00 00 00 D9 66  (HEX)
6/3/2022 2:15:50 PM : Server sent 126 9 135 71 2 0 0 0 0 0 0 217 102  (DEC)
6/3/2022 2:15:50 PM : Server sent 13 bytes to Client
---------------------------
6/3/2022 2:15:50 PM : Client sent 7E 01 1D 1E 1F  (HEX)

6/3/2022 2:15:50 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:15:50 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:15:58 PM : Server sent 7E 09 86 00 68 00 E0 40 62 04 82 02 07  (HEX)
6/3/2022 2:15:58 PM : Server sent 126 9 134 0 104 0 224 64 98 4 130 2 7  (DEC)
6/3/2022 2:15:58 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:15:58 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:15:58 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:15:58 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:16:04 PM : Server sent 7E 09 87 47 02 00 00 00 00 00 00 D9 66  (HEX)
6/3/2022 2:16:04 PM : Server sent 126 9 135 71 2 0 0 0 0 0 0 217 102  (DEC)
6/3/2022 2:16:04 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:14 PM : Server sent 7E 09 87 47 02 00 00 00 00 00 00 D9 66  (HEX)
6/3/2022 2:16:14 PM : Server sent 126 9 135 71 2 0 0 0 0 0 0 217 102  (DEC)
6/3/2022 2:16:14 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:19 PM : Client sent 7E 01 28 29 2A  (HEX)
6/3/2022 2:16:19 PM : Client sent 126 1 40 41 42  (DEC)
6/3/2022 2:16:19 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:16:29 PM : Server sent 7E 09 87 47 02 00 00 00 00 00 00 D9 66  (HEX)
6/3/2022 2:16:29 PM : Server sent 126 9 135 71 2 0 0 0 0 0 0 217 102  (DEC)
6/3/2022 2:16:29 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:39 PM : Client sent 7E 01 28 29 2A  (HEX)
6/3/2022 2:16:39 PM : Client sent 126 1 40 41 42  (DEC)
6/3/2022 2:16:39 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:16:44 PM : Server sent 7E 09 87 47 02 00 00 00 00 00 00 D9 66  (HEX)
6/3/2022 2:16:44 PM : Server sent 126 9 135 71 2 0 0 0 0 0 0 217 102  (DEC)
6/3/2022 2:16:44 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:44 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:16:44 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:16:44 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:16:45 PM : Server sent 7E 09 86 00 60 00 00 4C 62 04 82 25 9A  (HEX)
6/3/2022 2:16:45 PM : Server sent 126 9 134 0 96 0 0 76 98 4 130 37 154  (DEC)
6/3/2022 2:16:45 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:45 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:16:45 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:16:45 PM : Client sent 5 bytes to Server
---------------------------

6/3/2022 2:16:46 PM : Server sent 7E 09 87 07 02 00 00 00 00 00 00 99 64  (HEX)
6/3/2022 2:16:46 PM : Server sent 126 9 135 7 2 0 0 0 0 0 0 153 100  (DEC)
6/3/2022 2:16:46 PM : Server sent 13 bytes to Client
---------------------------

6/3/2022 2:16:46 PM : Client sent 7E 01 1D 1E 1F  (HEX)
6/3/2022 2:16:46 PM : Client sent 126 1 29 30 31  (DEC)
6/3/2022 2:16:46 PM : Client sent 5 bytes to Server
---------------------------

In this sample, things seem to be moving along fine with messages getting acknowledged until around 2:16:04. At that point, you can see in the proxy log that the panel sends several Partition Snapshot messages that pynx584 doesn't seem to get (nothing in it's log)...so they are not acknowledged. While this is happening, a couple of heartbeats get sent by pynx584 but there is no response from the panel I'm assuming because it's waiting for the ack. It isn't until 2:16:45 that pynx584 sees the Partition snapshot message and acknowledges it.

While pynx584 is in this state, the panel doesn't respond to anything sent to it from pynx584 I guess because of the pending ack. I had things get out of synch where the panel was armed, but pynx584 thought it wasn't and vice versa...I'm thinking because some of the zone and partition messages coming from the panel might be getting missed.

This is a lot, sorry...hopefully it makes sense though. Let me know if you have questions.

Thanks

kk7ds commented 2 years ago

Very clear debugging, thanks, no problems there.

I've been wracking my brain over your traces to figure out where the code could be silently ignoring some data or hanging and I'm not seeing it. In the process I did find one bug where I forgot to update an error handler, which I fixed. However, you can't have been hitting that without evidence in the log, so I don't think that's it. I also added some other checks and debug log messages which might help a little, although I doubt it. I swear that if what your proxy shows is happening, we should be showing something in the nx584 log. If we only read half a frame we wouldn't do the ack of course, but we'd log the "Skipping..." message the next time we go to process the queue and find the other half of the message waiting for us. I've also poured over your previous version to see if I mangled the logic at all, but I don't think so.

The only thing I could think of is that you weren't checking for "you should never receive a 0x7d except at the start of a message", which maybe could cause us to be doing something wrong, although your proxy would have to by lying about what we should have in our stream, and I'm sure it's not.

One weird thing I noticed is that your "partition status message" after we recover starts with 71 the first time and 7 the following. That's the status bit field for partition 1 right? Seems weird that it would be changing there?

The only thing I can really think of is if your proxy is dropping or buffering messages (although the latter probably not so much since we never see the missed messages). I assume you were using the same proxy setup with your other version and didn't have problems?

I pushed a set of tests on top of the proto implementation that done some input and output with the controller, including your exact sequences across the stall and it parses them fine. That's what leads me to wonder about some timing-related buffering sort of thing...

So, I dunno, maybe you can test with this updated stuff to see if we hit any of the other checks or log messages for a clue? It ran all day yesterday on my system with the code I posted before, no cases in the log of un-answered heartbeat messages.

You mentioned something in your initial PR about needing to set your heartbeat interval lower. What was that about? The heatbeat for me is purely to notice when the connection has stalled but hasn't actually reset. This happens when a high-power transmitter near the SoL device I'm using causes some corruption on the serial line, but isn't related to anything with keeping the panel itself awake.

drsweb commented 2 years ago

Thanks! I will check out your latest changes to see if anything behaves differently or if there are additional messages that show up that might shed some light on this.

I get this behavior with or without the proxy....just without the proxy I can't really see what's going on. And the proxy doesn't lie, it's just a straight pass through with some logging in the middle. What I'm seeing from it is what I would expect to see. It's almost like the TCP read in python decides to take a little break every now and again. And yes, I used the proxy with on the previous version and didn't notice this issue at all.

As far as the status bit changing on that partition message, I had been doing some arming/disarming, so it's possible that an exit delay expired or something like that.

Whatever this is definitely seems to be more pronounced with the 590 versus the 584. One thing I've noticed with the 590 is that if it doesn't get an ack for a transition message it sends, there is a longer time between it re-sending messages than the 584. It's probably 5 times longer than the 584 if I had to guess. So, that may make whatever this is more noticeable.

As far as the heartbeat, I'm really using it as a keep alive for the connection to the 590. It seems to disconnect idle connections pretty quickly...definitely less than the 120 seconds you had originally set that heartbeat time to. So, setting it to around 20 seconds fixes that issue with the 590 and keeps the connection alive.

Thanks again!

drsweb commented 2 years ago

BTW, I have a spare NX590. Would you be interested in that at all? It would be the least I could do to thank you for your amazing work on this project. If you would want it, we'd have to figure out a way for you to get me your mailing address privately to send it to you. It's a pretty cool little piece of hardware, in addition to the HA protocol it does multiple things for the NX panels.

kk7ds commented 2 years ago

I appreciate the offer, but I'm kinda hesitant to mess with my setup too much at the moment. It's (shockingly) critical to our automated lifestyle so I really get in trouble if things stop working around the house. I wrote this stuff for myself quite a while ago, integrated it into HA for a friend and then it was pretty much radio silence until recently where a bunch of people have been proposing changes. I even have some rooms with no physical light switches because the motion sensors trigger them. If I start dropping messages I could trip in the dark and ... DIE! :)

If we can't figure this out I'll try switching my panel to the binary protocol to narrow it down to that implementation (or NX590 weirdness). If that still doesn't work, then I'll maybe take you up on the offer and we can go from there.

drsweb commented 2 years ago

Ooh nobody needs to die over an NX590. :)

drsweb commented 2 years ago

@kk7ds I found a small issue when attempting a serial connection that I committed an update for.

Otherwise, I've tested with binary and ascii, serial and TCP, and NX584 and NX590 (and even proxy/no proxy) and I've seen no message loss at all and no other issues. It's working great for me...awesome work. I think this one is a release candidate if you agree.

This has got to be version 2.0, right? :)

kk7ds commented 2 years ago

Hmm okay that's great, but what changed? I didn't think any of the stuff I did was going to actually fix it.

But, if you think it's good for the moment then I guess it's good enough for me. I've seen no misbehaving on my ASCII setup for the several days it's been running it. So, not a regression at least :)

Thanks for your help on this, I know there are people that will appreciate having binary protocol support.

drsweb commented 2 years ago

Well, we'll just act like nothing ever happened then. :)

You're welcome, and thank you as well. You've been awesome to work with on this. Thanks again for this contribution. I know I appreciate being able to use this.