parnic / node-screenlogic

Pentair ScreenLogic Javascript library using Node.JS
https://www.npmjs.com/package/node-screenlogic
MIT License
52 stars 14 forks source link

Unknown message 8300 #37

Closed jpennell closed 4 years ago

jpennell commented 4 years ago

When adding, setting, or deleting a schedule, I often get an 8300 response, but not the ack I expected.

2020-07-01T00:00:48.721Z sl:unit sending set schedule event command for scheduleId: 702, circuitId: 505, startTime: 420, stopTime: 1080, dayMask: 32, flags: 0, heatCmd: 0, heatSetPoint: 0...
2020-07-01T00:00:48.854Z sl:unit received message of length 12
2020-07-01T00:00:48.854Z sl:unit   it's an unknown type: 8300

The message buffer for 8300 is always the same

{
  "type": "Buffer",
  "data": [
    255,
    127,
    108,
    32,
    4,
    0,
    0,
    0,
    1,
    0,
    0,
    0
  ]
}

When I get 8300, I do not usually receive the response I expect (eg. delete ack, set ack, add ack)

2020-07-01T15:48:26.382Z sl:unit sending delete schedule event command for scheduleId: 702...
2020-07-01T15:48:26.489Z sl:unit received message of length 12
2020-07-01T15:48:26.490Z sl:unit   it's an unknown type: 8300

Occasionally I receive both 8300 and the response I expect

2020-07-01T15:48:26.382Z sl:unit sending delete schedule event command for scheduleId: 702...
2020-07-01T15:48:26.489Z sl:unit received message of length 12
2020-07-01T15:48:26.490Z sl:unit   it's an unknown type: 8300
2020-07-01T15:48:26.494Z sl:unit received message of length 8
2020-07-01T15:48:26.494Z sl:unit   it's a delete schedule event ack 12547

Sometimes I receive 8300 multiple times for one operation

2020-07-01T15:48:26.382Z sl:unit sending delete schedule event command for scheduleId: 702...
2020-07-01T15:48:26.489Z sl:unit received message of length 12
2020-07-01T15:48:26.490Z sl:unit   it's an unknown type: 8300
2020-07-01T15:48:26.493Z sl:unit received message of length 12
2020-07-01T15:48:26.494Z sl:unit   it's an unknown type: 8300

The operation always works, when I check getScheduleData, the schedule has been added/deleted/set, even though I didn't get a response.

Any ideal what message 8300 might be? Any suggestions for how I can figure out what it means, and why I'm seeing this behaviour?

Thanks!

jpennell commented 4 years ago

Note that it's only those three operations seemingly, I've never seen this behaviour with getScheduleData, or any of the other operations eg. getPoolStatus, setSaltCellOutput, etc.

parnic commented 4 years ago

Hmm, according to the official app, message 8300 is HLM_SYSCONFIG_NOTIFYNEEDWRITEFLASH. I don't know what that means as neither the SLConfig app nor the ScreenLogic app seem to make use of that message. Normally I'd say to just ignore it, but the fact that you're not getting acks is certainly not intended.

Any ideas @bshep ?

parnic commented 4 years ago

There is a HLM_SYSCONFIG_WRITEFLASHNOWQ message that we could conceivably send in response to message 8300, but I'm not sure why the device would ask us to tell it to flash instead of just flashing. Maybe there's some risk associated with that or they're giving the client an opportunity to batch up calls?

jpennell commented 4 years ago

After some more testing, I've noticed that if I switch to a local connection (rather than remote) I can no longer reproduce the issue of missing acks. I still get 8300 messages, but through the local connection I always get the acks too. Maybe some issue with the screenlogic.pentair.com server. I'll just continue with local connections for now.

How did you find out that message 8300 is HLM_SYSCONFIG_NOTIFYNEEDWRITEFLASH? Is the source for the official app accessible somewhere? or did you decompile it or something?

parnic commented 4 years ago

After some more testing, I've noticed that if I switch to a local connection (rather than remote) I can no longer reproduce the issue of missing acks. I still get 8300 messages, but through the local connection I always get the acks too. Maybe some issue with the screenlogic.pentair.com server. I'll just continue with local connections for now.

That's interesting. Good to keep in mind.

How did you find out that message 8300 is HLM_SYSCONFIG_NOTIFYNEEDWRITEFLASH? Is the source for the official app accessible somewhere? or did you decompile it or something?

Check this comment for more info. In short: a decompiled copy.

jpennell commented 4 years ago

Fantastic, thanks @parnic I'll check out that comment.

bshep commented 4 years ago

Sorry for the late response, the 8300 message probably has nothing to do with your request, there are messages the controller sends that the regular windows app / android app ignore, there are a lot interesting messages in the source though, some that backup/restore the firmware for example. I would not touch those without some sample code since you could brick your system easily.

mikemucc commented 4 years ago

I would like to recommend we re-open this issue. The Branch name is scheduling-support. If you clone my API and exercise the /api/schedules POST/DELETE routes, you can replicate the behavior described above, complete with the 8300 message and the lack of a callback (the API will hang, as the api return is waiting for the callback from the library).

parnic commented 4 years ago

I would like to recommend we re-open this issue. The Branch name is scheduling-support. If you clone my API and exercise the /api/schedules POST/DELETE routes, you can replicate the behavior described above, complete with the 8300 message and the lack of a callback (the API will hang, as the api return is waiting for the callback from the library).

There's not a whole lot we can do here. The library passes on what it gets from the ScreenLogic system, it doesn't silently swallow anything. The 8300 is unrelated and can be ignored. Either the system sends us a callback or it doesn't. I could potentially update the readme, but it would seem that it's best to just not wait on the addScheduleEvent callback to fire.

mikemucc commented 4 years ago

I took your advice and broke my calls out and stopped waiting for the callbacks to return from my API. Now that I've done that, I'm also tracking a few other things so I can quickly reference them.

One of those things is the aggregated set of unique event IDs (so I can check to see if they exist before trying to do things to them), and just to make sure I'm sane, I'm currently printing them to console.log.

I currently have my polling set to 5 seconds per cycle, and I'm seeing it take up to 20 seconds (4 polls) before the event ID appears/disappears from the device.

So basically this operation takes a very long time to complete and return on the ScreenLogic side. Is it possible that it's taking so long that the open connection quit waiting for a callback and just moved on (does node even work that way?)?

Or perhaps that's what that 8300 (HLM_SYSCONFIG_NOTIFYNEEDWRITEFLASH) message means, the system is telling us to "hang on a sec (or 20), Imma let you finish, but I need to write something to internal storage"?

Not my intention to beat a dead horse, but I thought it might be relevant to report this.

parnic commented 4 years ago

I'm not completely following, but maybe we should open a new issue for the behavior you're seeing. I don't think it's related to the 8300 message at all.

I'm not entirely certain I follow what you mean by a set of unique event ids appearing on the device.

As far as node waiting on callbacks, that's not exactly how the connection to the screenlogic system works. It's a synchronous (TCP) connection, the device sends packets and receives packets. Those packets are always guaranteed to be in order, but there's no "waiting for a callback" happening.

The 8300 message does not seem to be connected to any specific message. The official app's source seems to suggest it's just the screenlogic device requesting that the app issue a "write flash" message. Since the official app doesn't send this message, I don't really want to add support for the library to do it out of fear that it could end up writing incomplete or invalid data and rendering the device useless.

mikemucc commented 4 years ago

Ok, I'll explain the behavior I'm seeing a little better (sorry, my brain is on schedules right now, so that's what I'm thinking in). I'll leave our friend 8300 out of it. :-)

I can open a new issue if you like (let me know your preference) after reading this wall of text...

TL;DR version: it looks like it's taking between 15-20 seconds for the Create New Schedule ID/Delete Schedule ID calls to successfully complete. If the ScreenLogic waits for that to complete before sending the ACK, it might explain why I'm missing those ACKs.

The Long Version: I'm now keeping a Set() object with the IDs of all of the Schedule Events (daily & run once) that exist on the system so I can quickly check to see if it exists (and return a 404 if somebody tries to Update or Delete an event ID that doesn't exist). I update this set every time I poll the ScreenLogic for new data. That polling frequency is currently 5 seconds.

So I end up with an object that looks kinda like this: { 700, 701, 702, 703, 704 }

So, from the perspective my my REST API:

  1. Start up the API. The API does its initial data load and logs all of the Schedule Event IDs that currently exist on the system (console.log() the Set() object that is schedules.eventIds).
  2. I call the "create New Schedule Event" endpoint.
  3. I watch the API logs make the call back to the ScreenLogic to create a new Schedule Event
  4. My API Polls the ScreenLogic Device immediately after making the CreateSchedule call (Lets cal this T=0) 3a. During that poll, the API prints all of the current Schedule Event IDs on the system again. This is currently still the same as in step 0.
  5. 5 seconds later, the API polls again (Call this T=+5 seconds). The set of Schedule Event IDs on the system hasn't changed.
  6. Repeat Step 4 (Call this T+10)
  7. Repeat Step 4 (T+15)
  8. API Polls (T+20). Now the Schedule IDs on the system has incremented by 1 (i.e. the new schedule has actually been created).

So the apparent behavior is that a Create or Delete schedule event takes between 15 and 20 seconds to complete (at least on my system, the backend of which is an EasyTouch). This could be a result of my hardware, maybe the EasyTouch is a little slower than some of the other systems, who knows.

So I thought I remembered @bshep adding a keepalive value to the client. Could this be related to the behavior I'm seeing?

I could modify that line in my local cache to something like this.client.setKeepAlive(true, 25 * 1000); and see if it affects the behavior I'm seeing (if you think that would have any affect).

parnic commented 4 years ago

Yes, please open a new issue. Feel free to mess with the keep alive if it's causing you problems and see what happens.