tagyoureit / nodejs-poolController

An application to control pool equipment from various manufacturers.
GNU Affero General Public License v3.0
326 stars 96 forks source link

[BUG] Heater or Set Temp Changes Break nodejsPC #848

Open matthewusf opened 1 year ago

matthewusf commented 1 year ago

nodejs-poolController Version/commit

8.01

nodejs-poolController-dashPanel Version/commit

8.0.0

relayEquipmentManager Version/commit

No response

Node Version

No response

Platform

No response

RS485 Adapter

No response

Are you using Docker?

OCP

Intellicenter i5PS - 1.064

Pump(s)

No response

Chlorinator(s)

No response

What steps will reproduce the bug?

Occasionally, when you make any changes to the heater set point or heater mode, I get a communication timeout error, after which this module no longer works until nodejsPC is restarted (dsahPanel can stay running). Unfortunately (or fortunately), it only happens about once a week during normal use, so it's hard to capture a replay of it because I haven't noticed any patterns for when I might be causing it. It's not just if I turn the spa on and adjust features, for instance. It last happened with me simply changing the Set Temp while the heater wasn't even on.

What happens?

ApiError: Message aborted after 3 attempt(s): 165,1,15,33,168,41,0,0,18,1,0,0,0,0,0,0,0,0,0,0,176,89,27,110,3,0,0,88,100,94,100,1,1,0,0,15,0,1,0,100,0,0,0,0,0,0,0,5,67 at RS485Port.writeMessage (/home/admin/nodejs-poolController/controller/comms/Comms.ts:1010:31) at RS485Port.processWaitPacket (/home/admin/nodejs-poolController/controller/comms/Comms.ts:946:22) at RS485Port.processOutboundPackets (/home/admin/nodejs-poolController/controller/comms/Comms.ts:954:19) at RS485Port.processPackets (/home/admin/nodejs-poolController/controller/comms/Comms.ts:936:14) at Immediate._onImmediate (/home/admin/nodejs-poolController/controller/comms/Comms.ts:896:114) at processImmediate (node:internal/timers:466:21)

What should have happened?

It doesn't change temp or heat mode, and then nodejs remains stuck where those things can no longer be changed. Everything else still works as normal.

Additional information

No response

rstrouse commented 1 year ago

Pull njsPC. I believe the throttling code to make sure njsPC does not process too many changes at once was getting stuck after IntelliCenter rejected the request.

treyrich commented 5 months ago

@rstrouse I'm running the current master and it appears that this is still occurring intermittently, once it breaks it seems to be broken until restarted, but what causes it to break still seems unclear to me. Next time it's happening I can try to capture some logs/replay, but right now it's working because I restarted it this morning before going searching for an issue.

rstrouse commented 5 months ago

It is important that I know which OCP you are using. EasyTouch, IntelliTouch, IntelliCenter, SunTouch, or Nixie. Also, please post a new issue when the issue is as old as this one. These will fall of the radar and get lost.

treyrich commented 5 months ago

Sorry, I assumed that it would be implied by commenting on an existing issue. I'm running IntelliCenter 1.064, same as OP. I'm also happy to open a separate issue if you want, I just figured that seeing as the issue is identical to this one it would make the most sense to start here. Let me know what is most helpful and I'm happy to do anything I can to help track this issue down.

matthewusf commented 5 months ago

I do also still have this issue occasionally, though it seems to be far more rare since it was addressed.

On Mon, Jun 17, 2024, 4:46 PM Trey Richards @.***> wrote:

Sorry, I assumed that it would be implied by commenting on an existing issue. I'm running IntelliCenter 1.064, same as OP. I'm also happy to open a separate issue if you want, I just figured that seeing as the issue is identical to this one it would make the most sense to start here. Let me know what is most helpful and I'm happy to do anything I can to help track this issue down.

— Reply to this email directly, view it on GitHub https://github.com/tagyoureit/nodejs-poolController/issues/848#issuecomment-2174392759, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSGYRCLVE6FO67T5YP6JPLZH5DJ5AVCNFSM6AAAAABJOUQ63KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUGM4TENZVHE . You are receiving this because you modified the open/close state.Message ID: @.***>

treyrich commented 5 months ago

replay.zip Here's a capture of the issue happening. In this case I tried to set the heatmode unsuccessfully. This recently has seemed to be happening to me once every day or two necessitating a restart when my automations start failing.

I have zero familiarity with this codebase, so the learning curve would be steep, but if there's a desire for help with this issue I'm willing to help look into it @rstrouse, let me know if it's worth me diving into the code and starting to get familiar.

tagyoureit commented 5 months ago

Hi, this replay doesn't have the lower level logging that's needed to fully debug the issue. Can you capture the issue again following the directions in the wiki?

treyrich commented 5 months ago

The original capture was taken following the directions in the wiki, was there a particular step that was missed? Here's a new one that was captured including checking the box "Capture Configuration Reload", if this is incorrect please let me know what I'm missing and I'll happily reproduce, I'm sure it's my own issue, but other than checking this box I'm not sure what other steps I missed... replay.zip

tagyoureit commented 4 months ago

The "Capture Configuration Reload" requests all the configuration items from the OCP so if you start the replay with that you should at least wait until it is done. Some messages in the queue failed because the OCP didn't respond to them because it was sending out the rest of the configuration.

I added a debug statement in the queueBodyHeadSettings function, but also added extra code to reset the bytes in the array. GIve it another try. If it still is unresponsive, we will have more debugging. If it's fixed, then yay.

treyrich commented 4 months ago

Got it.

I've just pulled the changes. Honestly we're outside of the heating window right now so it may be a couple months before this surfaces again in my system organically. If I get a chance I'll try to reproduce it though so that this doesn't get too stale.

treyrich commented 2 months ago

Ok @tagyoureit I've currently got my instance in a state where some things related to changing the heater heat source are broken. I've attached my entire logfile from my docker container here, though the level for most of it won't include debug. From what I can tell though it seems like everything is still working with the exception of anything heater related.

When I try running any feature circuits they all operate as expected, but if I try to change the temperature of the heater, or the heat source it fails.

I will avoid restarting my instance for a day or so until I hear back from you on what other information I can provide to help pin down the problem, best I can tell once it gets into this state it will remain here until I restart the application. If these logs are insufficient let me know. I'm also happy to help debug in any way that helps with some guidance.

logs.zip