tagyoureit / nodejs-poolController

An application to control pool equipment from various manufacturers.
GNU Affero General Public License v3.0
327 stars 97 forks source link

[BUG] Losing communication with AquaRite #964

Open celestinjr opened 5 months ago

celestinjr commented 5 months ago

nodejs-poolController Version/commit

8.0.1/e1e5190

nodejs-poolController-dashPanel Version/commit

8.0.0/d4f24a2

relayEquipmentManager Version/commit

No response

Node Version

V16.16.0

Platform

Linux nixie-poolcontroller 6.1.21-v7+ #1642 SMP Mon Apr 3 17:20:52 BST 2023 armv7l GNU/Linux

RS485 Adapter

No response

Are you using Docker?

OCP

Nixie Standalone

Pump(s)

Single Speed Pentair Challenger

Chlorinator(s)

AquaRite

What steps will reproduce the bug?

Leave njsPC online for > 1 day

What happens?

RS485 communication stops. No messages logged in messagemanager. Logs state serial timeout.

Executing “pm2 restart njsPC” restores communication.

What should have happened?

Communication should continue indefinitely.

Additional information

I thought this was resolved after disabling packet log to file. After doing so, I seem to have gotten longer run time (~1.5 days vs < 1 day), but this morning I had to restart njsPC to restore communication.

rstrouse commented 5 months ago

What adapter are you using. If there is a serial timeout that means that the adapter stopped responding at the hardware level.

celestinjr commented 5 months ago

Passthrough on the MegaBAS

rstrouse commented 5 months ago

What other items do you have installed on your pi stack? Are there any others (such as a relay8) that have an RS485 port?

celestinjr commented 5 months ago

Currently, the MegaBAS, a Relay8 board, and a GPIO breakout. A Sequent Watchdog was just delivered but is not installed yet.

rstrouse commented 5 months ago

Make sure you have the other RS485 jumpers removed. Or better yet remove them from the BAS and use the one on the relay8 to eliminate port problems on the BAS.

celestinjr commented 5 months ago

I thought I left all of the Relay8 jumpers out, but I will confirm next week when I have access to check. Thanks!

celestinjr commented 5 months ago

I believe I've narrowed this down to something killing the socket when apt updates a package (seemingly any package thus far).

I've been using an ansible script to apt update/upgrade the RPi along with some of my other devices. It seems that when a package is available and is upgraded, I simultaneously start getting the socket timeout logs and have to restart njsPC.

Is this behavior is known or expected? For now I'll restart the service or reboot the Pi after updates as a workaround.

celestinjr commented 5 months ago

Yesterday, I lost communication again and was not updating packages. Digging into the syslog, it looks like some scheduled “PackageKit” process may have been the culprit, since it ran at the same time (within 10s) as losing communication.

Strangely, njsPC recognizes the timeout and successfully reconnects to the port per the logs. Unfortunately, this doesn’t seem to restore communication, since it repeats this process every 10s. Restarting njsPC restores communication.

Any ideas?

syslog output:

Jun 19 11:33:03 nixie-poolcontroller PackageKit: refresh-cache transaction /253_acaecbca from uid 1000 finished with success after 3802ms Jun 19 11:33:24 nixie-poolcontroller PackageKit: get-updates transaction /254_deecdcdd from uid 1000 finished with success after 15348ms

tagyoureit commented 5 months ago

I’d use your Ansible script to rebuild and restart the njspc/dP/rem stack every time it pulls down new updates.

On Thu, Jun 20, 2024 at 10:56 AM celestinjr @.***> wrote:

Yesterday, I lost communication again and was not updating packages. Digging into the syslog, it looks like some scheduled “PackageKit” process may have been the culprit, since it ran at the same time (within 10s) as losing communication.

Strangely, njsPC recognizes the timeout and successfully reconnects to the port per the logs. Unfortunately, this doesn’t seem to restore communication, since it repeats this process every 10s. Restarting njsPC restores communication.

Any ideas?

syslog output:

Jun 19 11:33:03 nixie-poolcontroller PackageKit: refresh-cache transaction /253_acaecbca from uid 1000 finished with success after 3802ms Jun 19 11:33:24 nixie-poolcontroller PackageKit: get-updates transaction /254_deecdcdd from uid 1000 finished with success after 15348ms

— Reply to this email directly, view it on GitHub https://github.com/tagyoureit/nodejs-poolController/issues/964#issuecomment-2181237386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMSB6FKGK7JGIK2E6DF5PDZIMJVTAVCNFSM6AAAAABI446EN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGIZTOMZYGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>