Closed snagytx closed 1 year ago
What device do you have at the other end of this? Is this an Elfin EW11? This pattern is because the other end of the socket is closing without an error. Here is another discussion that I had with someone regarding the setup of this device. It appears that the EW11 leaves half-open sockets by default and periodically closes them. See this discussion on TFP. https://www.troublefreepool.com/threads/wiring-elfin-ew11-to-pentair-easytouch.253376/
I also added an endpoint to get the status. Simply call state/rs485Port/0
for the primary port and it will return all the current information related to the port. If you want any of the aux ports use state/rs485Port/1
...etc.
Yes, I have an Elfin EW11, I will check the thread and get back to you.
Yes, my configuration was the same as in that thread. I guess a njspc restart would restart or give EW11 an extra time to breath and would recover.
I'll see how I can use the new API once the new docker image is built. Thank you for adding it.
In case somebody wants to implement something just like me in docker environment:
Create a file healthcheck.js
, I mounted it under /app/data/healthcheck.js
, because it was convenient.
For start I mark the healthcheck fail if there are more than 10 reconnects.
The content of the file:
const http = require('http')
const options = {
host: 'localhost',
port: '4200',
timeout: 2000,
path: '/state/rs485Port/0',
}
const request = http.request(options, (res) => {
// console.log(`STATUS: ${res.statusCode}`)
if (res.statusCode === 200) {
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('end', async () => {
let reconnects = JSON.parse(data).reconnects
console.log(`reconnects: ${reconnects}`)
if(reconnects < 10){
process.exit(0)
}else{
process.exit(1)
}
});
} else {
process.exit(1)
}
})
request.on('error', function (err) {
console.log('ERROR', err)
process.exit(1)
})
request.end()
In docker-compose.yaml
I added:
healthcheck:
test: [ "CMD", "node", "/app/data/healthcheck.js" ]
timeout: 5s
interval: 10s
retries: 3
This checks every 10 seconds, with a 5 seconds time out, with 3 retires - if all retries fail the container would be marked as unhealthy.
Once the container was recreated I ran the following command to see the healthcheck.js script output:
docker inspect --format "{{json .State.Health}}" poolcontroller | jq
poolcontroller is the name of my njspc container.
To restart unhealthy containers you would need to use willfarrell/autoheal
. My docker-compose.yaml
file is:
version: '3.3'
services:
autoheal:
container_name: autoheal
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- AUTOHEAL_CONTAINER_LABEL=all
image: willfarrell/autoheal
When a container is going unhealthy the following log message is seen in the autoheal:
16-11-2022 16:04:49 Container /poolcontroller (47cc1f486e2c) found to be unhealthy - Restarting container now with 10s timeout
Very creative on the health script. Just curious, when the Elfin is not responding, have you tried a simple save of the RS485 port? That should close and reopen the port again and may either re-establish the socket connections. If that is true, then we should be able to monitor for the same inside the app and fix it without restarting the container. The Elfin definitely seems to be problematic but if re-establishing the connection after 10 reconnects works restarting the container seems like overkill.
When the messaged in the first post are seen, I did try to turn on/off the lights but I got "Comms port 0 is not open. Message aborted:"
poolcontroller | [11/15/2022, 11:44:51 AM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:01 AM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:01 AM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/15/2022, 11:45:01 AM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:01 AM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/15/2022, 11:45:01 AM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:08 AM] info: [11:45:08 AM] 10.10.10.18 PUT /state/circuit/setState {"id":5,"state":false}
poolcontroller | [11/15/2022, 11:45:08 AM] warn: Comms port 0 is not open. Message aborted: 165,1,15,16,168,36,15,0,0,32,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,0,0,1,0,5,173
poolcontroller | [11/15/2022, 11:45:08 AM] warn: Message aborted after 0 attempt(s): 165,1,15,16,168,36,15,0,0,32,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,0,0,1,0,5,173
poolcontroller | [11/15/2022, 11:45:08 AM] error: Comms port 0 is not open. Message aborted 165,1,15,16,168,36,15,0,0,32,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,0,0,1,0,5,173
poolcontroller | [11/15/2022, 11:45:11 AM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:11 AM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/15/2022, 11:45:11 AM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/15/2022, 11:45:11 AM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/15/2022, 11:45:11 AM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
Yes, that is expected. I meant to go to the Gears (upper right)-> Controller -> RS485 Port and then click Save Port (with no changes). This may reset the port.
OK, let me disable the healthcheck and wait for it to come back and give it a try
It seems that the issue is not resolved if try to click on "Save Port"
poolcontroller | [11/17/2022, 1:19:53 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:00 PM] info: [1:20:00 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 1:20:03 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:03 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 1:20:03 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:03 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 1:20:03 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:07 PM] info: [1:20:07 PM] 10.10.10.21 PUT /app/rs485Port {"enabled":true,"netConnect":true,"mockPort":false,"rs485Port":"/dev/ttyUSB0","netPort":8899,"netHost":"10.10.10.64","portId":0,"portSettings":{"baudRate":9600,"dataBits":8,"parity":"none","stopBits":1},"inactivityRetry":10}
poolcontroller | [11/17/2022, 1:20:07 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:07 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 1:20:07 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:07 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 1:20:07 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:10 PM] info: [1:20:10 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 1:20:17 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:17 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 1:20:17 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:17 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 1:20:17 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:21 PM] info: [1:20:21 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 1:20:23 PM] info: [1:20:23 PM] 10.10.10.21 PUT /app/rs485Port {"enabled":true,"netConnect":true,"mockPort":false,"rs485Port":"/dev/ttyUSB0","netPort":8899,"netHost":"10.10.10.64","portId":0,"portSettings":{"baudRate":9600,"dataBits":8,"parity":"none","stopBits":1},"inactivityRetry":11}
poolcontroller | [11/17/2022, 1:20:23 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:23 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 1:20:23 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:23 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 1:20:23 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:31 PM] info: [1:20:31 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 1:20:34 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:34 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 1:20:34 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 1:20:34 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 1:20:34 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
I even tried to increase the Inactivity Timeout
and then save, just to have a chance.
The strange part is that the reconnects count does not increase:
The Received
and Successful
numbers keep increasing, but none of the other numbers change.
Reviewing the logs when the issue started I see:
poolcontroller | [11/17/2022, 12:25:14 PM] info: [12:25:14 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:25:25 PM] info: [12:25:25 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:25:35 PM] info: [12:25:35 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:25:45 PM] info: [12:25:45 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:25:49 PM] warn: Net connect (socat) connection idle: 10.10.10.64:8899 retrying connection.
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) 0 request close: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) 0 sent FIN packet: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) destroy socket: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) 0 successfully closed: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 12:25:49 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:25:56 PM] info: [12:25:56 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:26:06 PM] info: [12:26:06 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:26:16 PM] info: [12:26:16 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:26:26 PM] info: [12:26:26 PM] 127.0.0.1 GET /state/rs485Port/0 {}
....... prev line repeated .......
poolcontroller | [11/17/2022, 12:33:28 PM] info: [12:33:28 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:33:38 PM] info: [12:33:38 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:33:48 PM] info: [12:33:48 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:33:58 PM] info: [12:33:58 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:34:07 PM] warn: Net connect (socat) connection idle: 10.10.10.64:8899 retrying connection.
poolcontroller | [11/17/2022, 12:34:07 PM] info: Net connect (socat) 0 request close: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:07 PM] info: Net connect (socat) 0 sent FIN packet: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:07 PM] info: Net connect (socat) destroy socket: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:07 PM] info: Net connect (socat) 0 successfully closed: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:09 PM] info: [12:34:09 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:34:17 PM] warn: Net connect (socat) connection idle: 10.10.10.64:8899 retrying connection.
poolcontroller | [11/17/2022, 12:34:19 PM] info: [12:34:19 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:34:27 PM] warn: Net connect (socat) connection idle: 10.10.10.64:8899 retrying connection.
poolcontroller | [11/17/2022, 12:34:29 PM] info: [12:34:29 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 12:34:37 PM] warn: Net connect (socat) connection idle: 10.10.10.64:8899 retrying connection.
poolcontroller | [11/17/2022, 12:34:37 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:37 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 12:34:37 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:39 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:39 PM] warn: Net connect (socat) 0 recovered from lost connection.
poolcontroller | [11/17/2022, 12:34:39 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 12:34:39 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 12:34:39 PM] info: [12:34:39 PM] 127.0.0.1 GET /state/rs485Port/0 {}
Ok what this is telling me is that there are a limited number of sockets on the EW11 and it does not process the FIN to destroy the socket and is probably waiting on some timeout in their code to tear it down. Depending on your equipment, the bus my go idle for longer than 11 seconds that you have it set to. In many circumstances that is an eternity but if the equipment on the bus is not chatty then it is likely the bus is silent for periods of 11 seconds or more.
Increase the inactivity timeout or set it to 0 (disabled) so that it lets the EW11 expire its socket on the timeout.
When everything works, I see multiple messages maybe every second, in the messageManager.html
.
Because it takes about 13 seconds to restart the container, I will set my timeout to 30 seconds.
time docker-compose restart poolcontroller
Restarting poolcontroller ... done
real 0m12.783s
user 0m1.171s
sys 0m0.139s
If 13 seconds is enough for connection to re-establish, when I restart the container, I think 30 seconds should be enough for njspc to reconnect.
You would think that to be true. One thing that could be going on here is that the wifi is switching channels on you or the connection gets interference for short periods. That could be making the EW11 orphan (half-close) the socket. Are there timeout settings on the EW11 configuration page? I wonder if decreasing the timeout on the adapter could improve the situation.
I wonder if there is a keep alive setting that does a better job at cleaning up its orphans.
These are the settings that I can have for the TCP Server:
I updated it to:
Let us know how these changes work out. I assume that the docs are below average on what the numbers mean. I'll take a stab at the purpose of each of these (but I could be wrong).
with Timeout set to 15 I get:
poolcontroller | [11/17/2022, 7:42:37 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 7:42:37 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 7:42:37 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 7:42:42 PM] info: [7:42:42 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:42:52 PM] info: [7:42:52 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:42:53 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 7:42:53 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
Increased the Timeout to 60 and I got:
poolcontroller | [11/17/2022, 7:44:40 PM] info: Net connect (socat) 0 connected to: 10.10.10.64:8899
poolcontroller | [11/17/2022, 7:44:40 PM] info: Net connect (socat) Connection 0 connected
poolcontroller | [11/17/2022, 7:44:40 PM] info: Net connect (socat) 0 ready and communicating: 10.10.10.64:8899
poolcontroller | [11/17/2022, 7:44:45 PM] info: [7:44:45 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:44:55 PM] info: [7:44:55 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:45:06 PM] info: [7:45:06 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:45:16 PM] info: [7:45:16 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:45:26 PM] info: [7:45:26 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:45:36 PM] info: [7:45:36 PM] 127.0.0.1 GET /state/rs485Port/0 {}
poolcontroller | [11/17/2022, 7:45:41 PM] info: Net connect (socat) 0 end event was fired
poolcontroller | [11/17/2022, 7:45:41 PM] info: Net connect (socat) 0 closed : 10.10.10.64:8899
Based on this, is seems that Timeout is the max life of a connection.
I will keep Keep Alive at 15 and keep an eye on it.
I don't have MQTT setup on the adapter. All I have is this one "TCP Server"
Well the timeout setting is pretty much good for nothing. One would think that it would reset with communication on the socket. The good news is that it knows how to send an end event to cleanly close the socket in njsPC though.
This is what is says about that setting:
Since njsPC is not sending any data, it will close the socket after timeout. Not sure if njsPC sends any keep-alive, maybe that would help, or that is not considered data?
Keep alives not considered data and that negotiation is done by the server. So it only looks at the send side of the socket. In this case we are connected to EasyTouch... correct?
Yes, EW11 connects into to IntelliCenter's RS485 network, and njsPC connects to EW11 over WiFi.
Have you, @snagytx, (or anyone else @rstrouse) tried the TCP server with Socat?
@snagytx there is the rub with this setting. njsPC must remain silent when it is not the master. It should only speak when it either has something to say or it is asked about something. So because the timeout is not related to any successful outgoing traffic or PONGs it will timeout.
@tagyoureit yes I have a socat running on RPi but this is different. This little device is just a microcontroller (probably STM or ESP) running its own socket server. The issue here is getting the right settings. Unfortunately, it half closes a socket when njsPC doesn't get any data for a while and the timeout is fired. This is because njsPC will send a FIN packet but the adapter doesn't close the socket completely. It only closes the receive half. This is the problem.
What happens here is njsPC stops listening on that socket but the adapter continues to hold onto that connection. Since there are only 3 open sockets allowed it eventually stops opening new full connections and only reattaches to the old half open one.
I was not successful to setup socat.
I tried to have socat on rpi:
OS: Raspbian GNU/Linux 11 (bullseye)
and on my docker server, Photon OS 4.0
But on both I get:
2022/11/17 15:00:03 socat[21712] E open("/dev/ttyUSB0", 02, 0666): No such file or directory
Or:
2022/11/17 15:00:35 socat[21749] E open("/dev/ttyACM1", 02, 0666): No such file or director
when I run:
socat TCP-LISTEN:9801,fork,reuseaddr FILE:/dev/ttyUSB0,b9600,raw
or
socat TCP-LISTEN:9801,fork,reuseaddr FILE:/dev/ttyACM1,b9600,raw
my intention is to push the messages from EW11 to my socat server, which would put the messages on the serial port to be read by njsPC, but not sure if socat knows how to create a serial port.
Why would you add socat in the middle of this communication? All socat does in this instance is create a TCP socket server from the raw RS485 port. That is what the EW11 does so you gain nothing doing that.
The problem here is that when njsPC resets the connection, the EW11 only resets half of it. This can occur when the network loses connection and when on WiFi it can be often especially when that connection is the furthest reaches of the radio network.
njsPC expects to hear from the RS485 feed whether that is over a TCP socket or a hardware port at regular intervals. When it doesn't hear from the port within the Inactivity Timeout interval it will close the connection and recreate it. In this instance the EW11 doesn't fully close the connection even though we send out a FIN packet asking the server to close it. This results in what is referred to as a half-open socket where one side of the socket thinks the connection is closed and the other side thinks it is still open.
So what is occurring in this case is that the TCP connection between the EW11 and njsPC is interrupted for the length of time identified by the Inactivity Timeout but the EW11 never closes the original socket. I am looking at some other potential solutions to this goofy processing.
1.4.3. Multiple Socket The device supports max 5 Socket channels, each socket can work individually as TCP/UDP. Multi Socket simultaneous communication of data stream is illustrated as following
From the manual it looks like it supports up to 5, not just 3. Doubt it will make a difference but can't hurt to bump it up.
I think I may have a solution that will force the server to not allow half open sockets. I am extending the options so that they can be tweaked from the client. This should allow us to try out some options when the socket is opened.
Pull njsPC and dashPanel. There are now options to not allow half open connections (at least on the njsPC side) and send keep alives to the EW11. This may allow the keep alive options to work on the EW11 so that it does not arbitrarily close the port when set.
I will do that, once the docker images get build.
Don’t forget to hard refresh your browser so the options show up.
Updated the docker containers and I can see the new options. I will keep an eye on the environment. Overnight I had no issues, with the old docker image and updated configuration for the adapter
Sounds like we are narrowing in on the right settings.
With the settings from my previous post, I ended up with the issue again.
I tried to use the logging capture feature but I had some issues. Not sure how relevant these logs are.
But I tried to enable/disable the "Primary Port" configuration, I tried to restart the TCP Server on the EW11 adapter, it didn't help. For the TCP Server I increased the Max Accept to 5 and clicked submit, this is how I resstarted the TCP server on EW11.
As mentioned before, I notice that the status of the port was displayed as "closed" but I was able to see the counts for received and successful starts increasing, but once I started to capture the logs, they stopped.
consoleLog(2022-11-18_11-50-15).log config.json.log packetLog(2022-11-18_11-50-15).log
Check the Allow Half Open checkbox and try again. We are getting an END event from the EW11. Also increase your Inactivity Retry to 20 seconds. This is greater than the keep alive time.
Updated to:
and keeping:
I think when it loses wifi connection or changes channels it simply half closes the connection on the server side. This would be fine since njsPC will see that and disconnect the old dead connection then reconnect. Sadly, it doesn't reconnect with a full connection and returns the old half open connection.
Just so you know a half open connection is one where it can receive but not send. Which is why it shows as closed but still receives bytes from the adapter.
Are you by chance running a mesh network?
Actually, yes I am running a mesh network, using RBR750 and two RBS750.
I wonder if Orbi is switching channels. Is any one node close to the pad? There are others running the EW11s and short of the initial config, they have been running fine. I have a pi3 running socat in the garage that also has been running fine for years as well. I too run a mesh network but it is AiMesh.
I don't think I have a way to check that. Yes, EW11 connects to one of the nodes.
I do realize that most likely there is something specific to my env. I more than happy spend time to test combinations and provide you feedback, but I opened this enhancement, just so that I can work around the issue caused by my environment without really making you spend too much time on it.
Let me know how it behaves with the current options we have selected. At some point I may need to get one of these things just to see how many screws are in it.
What about printing out debug messages from the socket? I wonder if that would enlighten us to anything.
I’m not sure what we would get out of debug messages. I do know that we are getting a socket end if we try to disable half open sockets (which is expected) but it doesn’t close the socket if Wi-Fi is interrupted. When njsPC tries to reconnect it runs out of available sockets on the device so it reconnects the old half open socket. Maybe if the timeout is enabled on the device with keep alive and allow half open in njsPC it may drop the orphaned connections.
💥 Proposal
What feature you'd like to see
An API call which can be accessed and used in docker containers as health check
Motivation
Heath check monitor is more of a workarround for environment specific issue, or other "bugs" in the product,
For example from time to time I get:
but once I restart the container the issue goes away. I had this errors logged for 5-6 hours, overnight, and never recovered until I restarted the container.
The health check API should report if there were 2-3 reconnections and no message communication. The connection idle seems to work fine but in certain cases it doesn't catch the above mentioned case.
Pitch
in docker container I would take advantage of this healts check and trigger a container restart. This would cleanly re-initialize everything.