tagyoureit / nodejs-poolController

An application to control pool equipment from various manufacturers.
GNU Affero General Public License v3.0
315 stars 94 forks source link

Commands fail after temporary loss of RS485 connectivity #357

Closed jwtaylor310 closed 1 year ago

jwtaylor310 commented 2 years ago

Describe the bug After a temporary (1 minute or more) loss of RS485 connectivity, commands sent from dashPanel fail. Restarting the poolController program fixes the problem.

To Reproduce Steps to reproduce the behavior: Disable RS485 connectivity for approximately 1 minute. For testing, I am able to force the error by power cycling the network switch that connects the Pi running the programs and the Elfin RS485 adapter.

Expected behavior Clicking on a Feature "button" should cause the feature to turn on/off. After the RS485 disruption, doing this causes the indicator 'light' to turn yellow for a few seconds and then revert to its previous state. The poolController console log shows the command was sent but no action results.

Screenshots If applicable, add screenshots to help explain your problem.

Packet Capture replay.zip

Pool Equipment

Desktop (please complete the following information):

Additional context I have been experiencing intermittent command failures after the system has been running for several days. The only solution I have found has been restarting the poolController. I recently found that I can force the error by power cycling the network switch that the RS485 adapter uses to communicate with the Pi that is running the pool programs. Once the switch is power cycled, Feature on/off commands no longer work. I have an earlier Python program that communicates through the same network switch and it is able to successfully send commands after the power cycle. The poolController console display shows an apparently successful ECONNRESET which does not fix the problem. While the system is in the fail mode it still displays the pool status correctly, showing updates resulting from commands entered on the EasyTouch wireless or on the OCP.

Here is an annotated copy of the poolController console display showing the problem. Manual annotations are in Italics:

nodejs-poolcontroller@7.5.0 build tsc

Init state for Pool Controller [10/17/2021, 8:19:42 AM] info: The current git branch output is master [10/17/2021, 8:19:42 AM] info: The current git commit output is ab95b2243e75d84cf730af38419e75cae7668209 [10/17/2021, 8:19:42 AM] info: Starting up SSDP server [10/17/2021, 8:19:42 AM] info: Checking njsPC versions... [10/17/2021, 8:19:42 AM] info: Starting Pool System easytouch [10/17/2021, 8:19:42 AM] info: Server is now listening on 0.0.0.0:4200 [10/17/2021, 8:19:42 AM] info: Net connect (socat) connected to: 172.16.31.242:8899 [10/17/2021, 8:19:42 AM] info: Net connect (socat) Connection connected [10/17/2021, 8:19:42 AM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899 [10/17/2021, 8:19:43 AM] info: Auto-backup initialized Last Backup: 2021-10-16T23:23:21.532-0400 Next Backup: 2021-10-17T11:23:21.532-0400 [10/17/2021, 8:19:43 AM] info: New socket client connected lhRWFQBiD5jEGWbUAAAB -- 192.168.80.55 [10/17/2021, 8:19:43 AM] info: [8:19:43 AM] 192.168.80.55 GET /state/all?null {} [10/17/2021, 8:19:43 AM] info: New socket client connected wqfub2-xKf-sfIXVAAAD -- 192.168.80.55 [10/17/2021, 8:19:43 AM] info: [8:19:43 AM] 192.168.80.55 GET /state/all?null {} Pentair EasyTouch System Detected! [10/17/2021, 8:19:44 AM] info: Found Controller Board EasyTouch2 8 [10/17/2021, 8:19:44 AM] info: Requesting easytouch configuration [10/17/2021, 8:19:45 AM] info: New socket client connected kOpFvp-zS5Qo2qJMAAAF -- 172.16.31.151 [10/17/2021, 8:19:45 AM] info: [8:19:45 AM] 172.16.31.151 GET /state/all?null {} sendRS485PortStats set to false [10/17/2021, 8:19:54 AM] info: New socket client connected 0ec3sapjgsHuuQKxAAAH -- 192.168.80.55 [10/17/2021, 8:19:54 AM] info: [8:19:54 AM] 192.168.80.55 GET /state/all?null {} RS485 Stats:{ "bytesReceived": 1465 "success": 48, "failed": 1, "bytesSent": 468, "collisions": 0, "failureRate": 2.04% } RS485 Stats:{ "bytesReceived": 1692 "success": 56, "failed": 2, "bytesSent": 540, "collisions": 0, "failureRate": 3.45% } [10/17/2021, 8:20:02 AM] info: EasyTouch system config complete. [10/17/2021, 8:20:02 AM] info: Initializing Nixie Controller [10/17/2021, 8:20:02 AM] info: Nixie Controller Initialized [10/17/2021, 8:20:02 AM] info: [8:20:02 AM] 192.168.80.55 GET /state/all?null {} [10/17/2021, 8:20:02 AM] info: New socket client connected utMk3yfS5f4U9V1AAAJ -- 192.168.80.55 [10/17/2021, 8:20:02 AM] info: [8:20:02 AM] 192.168.80.55 GET /state/all?null {} [10/17/2021, 8:20:03 AM] info: Last auto-backup 2021-10-16T23:23:21.532-0400 Next auto - backup 2021-10-17T11:23:21.532-0400 08:21:00 Network Switch Disabled 08:23:00 Network Switch Enabled 08:24:52 “HP ENABLED” circuit commanded on [10/17/2021, 8:24:52 AM] info: [8:24:52 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true} FAILED [10/17/2021, 8:24:52 AM] error: Net connect (socat) Connection: Error: read ECONNRESET. Retry in 10 seconds [10/17/2021, 8:24:52 AM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/17/2021, 8:25:02 AM] info: Net connect (socat) connected to: 172.16.31.242:8899 [10/17/2021, 8:25:02 AM] info: Net connect (socat) Connection connected [10/17/2021, 8:25:02 AM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899 08:27:00 “HP ENABLED” circuit commanded on [10/17/2021, 8:27:00 AM] info: [8:27:00 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true} FAILED 08:29:00 “HP ENABLED” circuit commanded on [10/17/2021, 8:29:00 AM] info: [8:29:00 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true} FAILED _

rstrouse commented 2 years ago

I have had this problem with socat in the past. Unfortunately, it is related to the adapter not opening up with the proper settings from socat. Here is my pm2 startup. Notice I hav added all the baud rate settings but it doesn't always restart appropriately. The funny part is that I created a small bit of code that opens the port locally at 9600 8N1 then closes it and it works every time.

The solution below is still being tested and I have a completely different adapter. In your python program are you connecting via the python serial interface. If you are then perhaps sharing your connection code may shed some light on how to get the remote adapter to open with the correct settings.

module.exports = {
apps : [{
    name: 'socat',
    cwd: '/usr/bin',
    script: 'socat',
    args: 'TCP-LISTEN:9801,fork,reuseaddr FILE:/dev/ttyUSB0,b9600,ispeed=9600,ospeed=9600,raw',
    autorestart: false,
    restart_delay: 10000,
    watch: true,
    ignore_watch: ['data', 'config.json', 'node_modules', 'logs', 'dist', 'cache'],
    listen_timeout: 60000
  }],
  deploy : {
    production : {
      user : 'SSH_USERNAME',
      host : 'SSH_HOSTMACHINE',
      ref  : 'origin/master',
      repo : 'GIT_REPOSITORY',
      path : 'DESTINATION_PATH',
      'pre-deploy-local': '',
      'post-deploy' : 'npm install && pm2 reload ecosystem.config.js --env production',
      'pre-setup': ''
    }
  }
};
rstrouse commented 2 years ago

Scratch the previous post. I do not see an outbound message when you issue the command. There is something else afoot here indicating that we are not clearing the error after reconnecting to the port. njsPC will try to reconnect for a period then it will stop so it doesn't become an nuisance. However, it should resume as soon as a reconnect is successful.

rstrouse commented 2 years ago

Pull njsPC the connection once it was closed was being marked as read only. Once the ready event is raised by the port it will now mark itself as read/write again.

jwtaylor310 commented 2 years ago

Thanks for looking into this. I will do the pull and test tomorrow morning.

jwtaylor310 commented 2 years ago

Looks like the change didn't fix the problem. I ran a 'git pull' in both the nodejs-poolController and the nodejs-poolController-dashPanel folders and restarted both programs. Once again, after an interruption of the connection to the adapter, all attempts to change the state of a feature fail.

The poolController program successfully identifies the loss of connection and does retries until the connection is reported to be working. However, commands sent after the connection is re-established don't work. Restarting the program fixes the problem, so whatever the program does to create the initial connection works.

Here's a copy of the console display:

nodejs-poolcontroller@7.5.0 build tsc Init state for Pool Controller [10/19/2021, 12:49:00 PM] info: The current git branch output is master [10/19/2021, 12:49:00 PM] info: The current git commit output is a2903da6c15f08d44883aa50ce325a005358ab21 [10/19/2021, 12:49:00 PM] info: Starting up SSDP server [10/19/2021, 12:49:00 PM] info: Checking njsPC versions... [10/19/2021, 12:49:00 PM] info: Starting Pool System easytouch [10/19/2021, 12:49:01 PM] info: Server is now listening on 0.0.0.0:4200 [10/19/2021, 12:49:01 PM] info: Net connect (socat) connected to: 172.16.31.242:8899 [10/19/2021, 12:49:01 PM] info: Net connect (socat) Connection connected [10/19/2021, 12:49:01 PM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899 [10/19/2021, 12:49:01 PM] info: Auto-backup initialized Last Backup: 2021-10-19T11:26:29.739-0400 Next Backup: 2021-10-19T23:26:29.739-0400 Pentair EasyTouch System Detected! [10/19/2021, 12:49:05 PM] info: Found Controller Board EasyTouch2 8 [10/19/2021, 12:49:05 PM] info: Requesting easytouch configuration [10/19/2021, 12:49:08 PM] info: New socket client connected PC1HnAzXhQp-KOhnAAAB -- 172.16.31.151 [10/19/2021, 12:49:08 PM] info: [12:49:08 PM] 172.16.31.151 GET /state/all?null {} sendRS485PortStats set to false [10/19/2021, 12:49:12 PM] info: New socket client connected _oOeZcKcJYGvOJBzAAAD -- 192.168.80.55 [10/19/2021, 12:49:12 PM] info: [12:49:12 PM] 192.168.80.55 GET /state/all?null {} RS485 Stats:{ "bytesReceived": 828 "success": 31, "failed": 1, "bytesSent": 252, "collisions": 0, "failureRate": 3.13% } sendRS485PortStats set to false [10/19/2021, 12:49:14 PM] info: New socket client connected 2XoHo25Pl2bFk5cwAAAF -- 192.168.80.55 [10/19/2021, 12:49:14 PM] info: [12:49:14 PM] 192.168.80.55 GET /state/all?null {} [10/19/2021, 12:49:21 PM] info: Last auto-backup 2021-10-19T11:26:29.739-0400 Next auto - backup 2021-10-19T23:26:29.739-0400 [10/19/2021, 12:49:23 PM] info: EasyTouch system config complete. [10/19/2021, 12:49:23 PM] info: Initializing Nixie Controller [10/19/2021, 12:49:23 PM] info: Nixie Controller Initialized

12:53:00 Lights Circuit Commanded ON - Successful [10/19/2021, 12:53:00 PM] info: [12:53:00 PM] 192.168.80.55 PUT /state/circuit/setState {"id":2,"state":true}

12:53:30 Network connection blocked

12:54:06 Lights Circuit Commanded OFF - Failed [10/19/2021, 12:54:06 PM] info: [12:54:06 PM] 192.168.80.55 PUT /state/circuit/setState {"id":2,"state":false} [10/19/2021, 12:54:06 PM] error: Net connect (socat) Connection: Error: read ECONNRESET. Retry in 10 seconds [10/19/2021, 12:54:06 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/19/2021, 12:54:16 PM] error: Net connect (socat) Connection: Error: connect ECONNREFUSED 172.16.31.242:8899. Retry in 10 seconds [10/19/2021, 12:54:16 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/19/2021, 12:54:26 PM] error: Net connect (socat) Connection: Error: connect ECONNREFUSED 172.16.31.242:8899. Retry in 10 seconds [10/19/2021, 12:54:26 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/19/2021, 12:54:36 PM] error: Net connect (socat) Connection: Error: connect ECONNREFUSED 172.16.31.242:8899. Retry in 10 seconds [10/19/2021, 12:54:36 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/19/2021, 12:54:46 PM] error: Net connect (socat) Connection: Error: connect ECONNREFUSED 172.16.31.242:8899. Retry in 10 seconds [10/19/2021, 12:54:46 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899 [10/19/2021, 12:54:56 PM] error: Net connect (socat) Connection: Error: connect ECONNREFUSED 172.16.31.242:8899. Retry in 10 seconds [10/19/2021, 12:54:56 PM] info: Net connect (socat) closed due to error: 172.16.31.242:8899

12:55:00 Network connection restored [10/19/2021, 12:55:06 PM] info: Net connect (socat) connected to: 172.16.31.242:8899 [10/19/2021, 12:55:06 PM] info: Net connect (socat) Connection connected [10/19/2021, 12:55:06 PM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899

12:55:31 Lights Circuit Commanded OFF - Failed [10/19/2021, 12:55:31 PM] info: [12:55:31 PM] 192.168.80.55 PUT /state/circuit/setState {"id":2,"state":false}

12:58:00 Lights Circuit Commanded OFF - Failed [10/19/2021, 12:58:00 PM] info: [12:58:00 PM] 192.168.80.55 PUT /state/circuit/setState {"id":2,"state":false}

rstrouse commented 2 years ago

I reset the Ready To Send flag for the port so pull njsPC and try again. Unfortunately, I don't have this adapter and a remote connection appears to reconnect differently using a remote pi dongle and socat.

jwtaylor310 commented 2 years ago

That fixed it! Thank you.

jwtaylor310 commented 2 years ago

I have a couple of extra Elfin RS485 adapters, one that connects to a wired lan and another that connects via WiFi. If you have interest in either of them, I'd be pleased to send it to you as a 'thank you' for all of your help with this. Just let me know which type and how to get it to you....

rstrouse commented 2 years ago

You would be shocked by the sheer number of adapters, boards, components, and miscellaneous electronics that I have laying around. Just glad we fixed it and really don't want a new thing to play with. The issue was that the pool was simply filled up with failed commands. I'm going to close this for now and if you need anything else just submit another report.

jwtaylor310 commented 2 years ago

OK. Thanks again!

From: rstrouse @.> Sent: Tuesday, October 19, 2021 5:20 PM To: tagyoureit/nodejs-poolController @.> Cc: jwtaylor310 @.>; Author @.> Subject: Re: [tagyoureit/nodejs-poolController] Commands fail after temporary loss of RS485 connectivity (Issue #357)

You would be shocked by the sheer number of adapters, boards, components, and miscellaneous electronics that I have laying around. Just glad we fixed it and really don't want a new thing to play with. The issue was that the pool was simply filled up with failed commands. I'm going to close this for now and if you need anything else just submit another report.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tagyoureit/nodejs-poolController/issues/357#issuecomment-947113870 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AHSGXKSA2GYL33NHZKNFAH3UHXOHTANCNFSM5GE4PUMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AHSGXKWW4LZOMQEVR52A6WTUHXOHTA5CNFSM5GE4PUMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHBZ47DQ.gif

mguinness commented 1 year ago

I'm also getting this problem intermittently with an Elfin EE11. However the indicator light doesn't turn yellow, but stays green. When I go into dashPanel and open the RS485 port tab in configuration and expand the port it shows the status as "closed" but data is still being received (see screenshot below). If I restart njsPC it works again and shows status as "open".

njsps

Is there a reason why njsPC is showing status as closed when it really is open and receiving data? Is it possible to provide a retry connection option when the port is showing closed in dashPanel? I tried to disable and then re-enable the port but it made no difference.

rstrouse commented 1 year ago

If you simply save the port it will try to re-open the connection. I will have a look at the socket code to see if we can be more aggressive with half-open sockets.

Please supply a replay at the time you are experiencing the failure. I assume that you are receiving messages but it cannot send.

mguinness commented 1 year ago

Saving port didn't reopen connection. It does appear to be a half-open socket as I can see temps, pump status etc. in real-time but when I select pool light I get an error (see extract from consoleLog below). The last 5 lines in log repeat each minute.

[03/10/2022, 14:16:35] info: [14:16:35] 192.168.1.204 GET /app/config/startPacketCapture {}
[03/10/2022, 14:16:35] info: Starting Replay Capture.
[03/10/2022, 14:16:35] info: Clearing message log: /home/pi/nodejs-poolController/logs/packetLog(2022-09-27_10-37-50).log
[03/10/2022, 14:16:35] debug: Serial Port 0: Cannot perform drain function on port that is not open.
[03/10/2022, 14:16:36] info: Net connect (socat) 0 connected to: 192.168.1.205:8899
[03/10/2022, 14:16:36] info: Net connect (socat) Connection 0 connected
[03/10/2022, 14:16:36] info: Net connect (socat) 0 ready and communicating: 192.168.1.205:8899
[03/10/2022, 14:16:36] error: Net connect (socat) connection 0 error: Error: read ECONNRESET.  Retry in 10 seconds
[03/10/2022, 14:16:36] info: Net connect (socat) 0 closed due to error: 192.168.1.205:8899
[03/10/2022, 14:16:38] info: [14:16:38] 192.168.1.204 PUT /state/circuit/setState {"id":4,"state":true}
[03/10/2022, 14:16:38] warn: Comms port 0 is not open. Message aborted: 165,1,15,16,168,36,15,0,0,40,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,0,0,1,0,5,173 
[03/10/2022, 14:16:38] warn: Message aborted after 0 attempt(s): 165,1,15,16,168,36,15,0,0,40,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,0,0,1,0,5,173 
[03/10/2022, 14:16:38] error: undefined
[03/10/2022, 14:16:41] info: [14:16:41] 192.168.1.204 GET /app/config/log {}
[03/10/2022, 14:16:41] info: [14:16:41] 192.168.1.204 GET /state/appVersion?null {}
[03/10/2022, 14:16:41] info: [14:16:41] 192.168.1.204 GET /app/config/options/backup?null {}
[03/10/2022, 14:16:41] info: [14:16:41] 192.168.1.204 GET /config/all?null {}
[03/10/2022, 14:16:46] info: Net connect (socat) 0 connected to: 192.168.1.205:8899
[03/10/2022, 14:16:46] info: Net connect (socat) Connection 0 connected
[03/10/2022, 14:16:46] info: Net connect (socat) 0 ready and communicating: 192.168.1.205:8899
[03/10/2022, 14:16:46] error: Net connect (socat) connection 0 error: Error: read ECONNRESET.  Retry in 10 seconds
[03/10/2022, 14:16:46] info: Net connect (socat) 0 closed due to error: 192.168.1.205:8899
[03/10/2022, 14:16:48] info: [14:16:48] 192.168.1.204 GET /app/config/stopPacketCapture?_=1664831698523 {}
[03/10/2022, 14:16:48] error: ENOENT: no such file or directory, copyfile '/home/pi/nodejs-poolController/data/poolConfig.json' -> '/home/pi/nodejs-poolController/logs/2022-09-27_10-37-50/poolConfig.json'
[03/10/2022, 14:16:56] info: Net connect (socat) 0 connected to: 192.168.1.205:8899
[03/10/2022, 14:16:56] info: Net connect (socat) Connection 0 connected
[03/10/2022, 14:16:56] info: Net connect (socat) 0 ready and communicating: 192.168.1.205:8899
[03/10/2022, 14:16:56] error: Net connect (socat) connection 0 error: Error: read ECONNRESET.  Retry in 10 seconds
[03/10/2022, 14:16:56] info: Net connect (socat) 0 closed due to error: 192.168.1.205:8899
rstrouse commented 1 year ago

Well look at that. The EE11 is sending an ECONNRESET immediately after connecting. This comes from the other half of the socket. Do you by chance happen to have the MQTT packets enabled as well? An ECONNRESET can happen because the sockets on the other side are busy with another process. I can't imagine that there is a whole lot of processing power on the dongle.

[03/10/2022, 14:16:46] error: Net connect (socat) connection 0 error: Error: read ECONNRESET.  Retry in 10 seconds

Interestingly I am also seeing an error during packet capture that seems to indicate the logs directory does not exist.

[03/10/2022, 14:16:48] error: ENOENT: no such file or directory, copyfile '/home/pi/nodejs-poolController/data/poolConfig.json' -> '/home/pi/nodejs-poolController/logs/2022-09-27_10-37-50/poolConfig.json'
mguinness commented 1 year ago

Ah, I think you're onto something here. I had previously made a post on TFP about the device getting bogged down. In fact when I try to use the admin web interface I now get ERR_CONNECTION_RESET in the browser.

I don't have MQTT packets enabled. I've now changed Max Accept from 5 to 3 to see if that makes any difference - although I assume only a single connection would be made?

rstrouse commented 1 year ago

It may be holding onto half of the socket if it disconnects even if that socket has no activity. Then it simply rejects a request for the connection. njsPC does send the proper handshake for dropping the connection though. Perhaps you can increase the Inactivity Timeout in dashPanel to see if it is just getting tardy.

mguinness commented 1 year ago

Thanks. When you say "it" I assume you mean the Elfin? I'm not clear about increasing the Inactivity Timeout, is that the inactivityRetry option in njsPC?

inactivityRetry - # of seconds the app should wait before trying to reopen the port after no communications. If your equipment isn't on all the time or you are running a virtual controller you may want to dramatically increase the timeout so you don't get console warnings.

Can njsPC also set the indicator light to yellow when ECONNRESET is returned from socat? The only way to tell the port is closed is to go into the RS485 port dialog.

rstrouse commented 1 year ago

In the configuration in dashPanel simply increase the inactivity timeout value. Sorry njsPC cannot control the functions of the adapter. I am guessing that the light indicates the rs485 link and not the socket.

mguinness commented 1 year ago

I meant the indicator light in dashPanel next to the word Ready in the screenshot below.

Screenshot_20220926-094229

rstrouse commented 1 year ago

Hmmm. That sounds like a great idea. Let me see what I can do.

rstrouse commented 1 year ago

If you pull njsPC and dashPanel it will now show an icon on the header when there are system messages. If you click on the icon it will show what equipment it is having issue with.

mguinness commented 1 year ago

Thanks! I've pulled the latest changes and will report back when I see the icon appear.

rstrouse commented 1 year ago

It will look like this. If you click on it you can see which equipment has failed. As of now it will report on Nixie Heaters and RS485 but I expect we will add notifications for other equipment items at a later date.

image

tagyoureit commented 1 year ago

Love it!!! I had thought about a central place for all notifications in the past but it slipped my mind. Good to see you start down this path.

mguinness commented 1 year ago

OK, working as described. The icon now appears when socat is unable to reconnect. I've changed the default timeout from 0 to 300 seconds for the TCP server on the EE11 to see if that helps. Thanks for your help troubleshooting, I think you can now close this issue as the underlying problem rests with the Elfin.

njspc

rstrouse commented 1 year ago

You should increase the Inactivity Timeout above as well just in case the Elfin is just tardy. IntelliCenter sends data at least every 5 seconds but if the process on the TCP server of the Elfin is tardy then it could assume the connection is dead.

mguinness commented 1 year ago

Thanks, I've increased it to 30 secs now.

rstrouse commented 1 year ago

Let me know how all of this is working. I have a suspicion that the Elfin doesn't properly respond to the FIN packets for the socket. It simply waits to see if there is any activity on the socket and drops it over time. This means that it never cleans up old dead sockets and I can imagine that it simply has a limit to the number of active sockets. So it keeps sending on the inactive socket and doesn't release it for the new socket.

In the last set of updates I forced a FIN to the socket on drop.

mguinness commented 1 year ago

It's been running fine since I last commented, the RS485 port stays open. Hoping that changes to timeouts for client and server made a difference.

The admin web interface still only partially loads which was previously mentioned in a post on TFP.

However I can still restart the Elfin device remotely with the following that calls the admin web server.

curl -H 'Authorization: Basic YWRtaW46YWRtaW4=' -H 'Content-Type: application/json' -d 'msg={"CID":20003,"PL":{}}' http://192.168.1.205/cmd

rstrouse commented 1 year ago

Cool. Serial communications are by nature processor bound since they require a timing thread. I would guess that there is only one thread on that little device with very limited resources. I might expect that reading the port buffer takes priority over all else.

Setting the TCP server to 300 probably triggers the device to destroy dead sockets periodically which also frees up resources. Increasing the njsPC timeout to 30 seconds also probably helps if the response is tardy. I also added a FIN broadcast even if the port has been dead after the timeout. Maybe that is in play to close the half-open side on the device. It was half open because it was still sending bytes on the socket even though it couldn't receive any.

I'm going to close this for now and if you have any other further issues open a new issue and reference this one.