khoih-prog / WebSockets_Generic

WebSocket / Socket.IO Server and Client for Arduino based on RFC6455. Now supporting Adafruit nRF52, Portenta_H7, STM32F/L/H/G/WB/MP1, Teensy, SAM DUE, SAMD21, SAMD51, Arduino SAMD21 (Nano 33 IoT), MKR1000 / MKR1010WiFi, RP2040-based boards using WiFi101, WiFiNINA, WiFi, Ethernet, WT32_ETH01, Teensy 4.1 NativeEthernet/QNEthernet or Portenta_H7 WiFi/Ethernet, etc. so that those boards can be voice-controlled by Alexa. Now supporting websocket only mode for Socket.IO. Ethernet_Generic library is used as default for W5x00
GNU General Public License v3.0
88 stars 22 forks source link

SocketIO re-connect does not work #22

Closed iKK001 closed 2 years ago

iKK001 commented 2 years ago

Describe the bug

Using several boards (such as the RP2040 Connect or the MKRWiFi1010) on a WiFi connection (i.e. no Ethernet-shield but pure WiFi), I had to realize that this library's SocketIO re-connect does not work.

Steps to Reproduce

Using a remote Node.js Server (cannot share since our customer does not allow), I made the following observation:

(by the way, very similar to this issue explained here - unfortunately no solution was posted about this issue yet)

Steps to reproduce:

Expected behavior

The expected behaviour is that the Arduino SocketIO Client re-connects automatically.

This behaviour is the standard behaviour of Socket IO as described here and here.

Not sure if important - but on the official Socket.io page, it sais Please note that since Socket.IO v3, the Socket instance does not emit any event related to the reconnection logic anymore. You can listen to the events on the Manager instance directly:

socket.io.on("reconnect_attempt", () => {
  // ...
});
socket.io.on("reconnect", () => {
  // ...
});

One more observation:

I am using WiFi to connect. (RP2040 Connect or MKRWiFi1010) and I am using beginSSL(...) to connect with SocketIO.

All that works fine.

Now, when I turn off my WiFi network, I do get a DISCONNECT event inside the Arduino Client.

But I do not get this event when the remote-server turns its node off.

iKK001 commented 2 years ago

One more thing:

I set _WEBSOCKETS_LOGLEVEL to 4.

And when the remote SocketIO server turns off and back on, there is absolutely no log at all inside the Serial Monitor !

But when the remote SocketIO server continues to turn off, turn on minute after minute, then 1, 2 or 3 minutes later (sometimes 5 minutes), the disconnect event occurs on the Arduino (and the Arduino client re-connects as it should have the very first time the server's on/off occurred).

This means to me that the Arduino is still running and still listening to the server-turn-offs somehow. But it does not do it every time. And most unfortunately, it NEVER does it the first time the server turn-off/on occurs. At its best, it does it after 60 seconds at the second server turn-off/on.

Imagine the server turn-off/turn-on period is set to 3600 sec (as it is in our application). Then the Arduino crazy-state would go for 1hour until it eventually re-connects. And this is a killer in our application.

Can I please ask you to investigate further the socketIO re-connect. Thank you.

I can provide more detailed information if you need it.

iKK001 commented 2 years ago

For the short-term I can live with the following alternative workarround (see below).

Of course, a re-connect that works out of the box would be much much more preferred. But since so complicated for me, I fail with ideas on how to achieve this in your library.

Here the workarround:

  1. before the server-Node turns off and back on again: send a special cmd to the Arduino client telling it to disconnect and connect. That way, the scalable server-Node will not even achieve (in our case with Google Cloud Run) the max. 3600 sec lifetime limit and is re-created by the new connect (and will have itself a new 3600 sec lifetime).
  2. on the Arduino client side, listen for this cmd. And when the special cmd arrives, do a disconnect and connect manually in code.

I realized that SocketIO in the WebSockets_Generic library does not have a disconnect() function yet. Therefore I added it to the library (see code below). That way, I can at least provoque the disconnect/connect actions myself.

Remark: Of course, it would be much better if the SocketIO library could do a proper re-connect on itself.

Remark2: And also, the time for the special-cmd disconnect/connect mechano takes approximately 14 seconds on a RP2040 Connect. This is rather long. And I assume a proper re-connect functionality out of the box would reduce this time even more.

Here the needed SocketIO functions to do the disconnect/connect mechano on your own :

--> Inside file SocketIOclient_Generic.h, add the following function definition:

void disconnect();

--> Inside file SocketIOclient_Generic-Impl.h, add the following function definition:

void SocketIOclient::disconnect()
{
  WebSocketsClient::disconnect();
}
khoih-prog commented 2 years ago

Hi @iKK001

I just tested with SIO and everything is OK with my version of SIO Server and code

  1. WebSocketClientSocketIO_W5500 example
  2. RP2040 using W5500 Ethernet and arduino-pico core v1.12.0
  3. Arduino IDE v1.8.19 and Linux Ubuntu
  4. Nodejs_SocketIO_TestServer
Start WebSocketClientSocketIO_W5500 on RASPBERRY_PI_PICO with W5x00 using Ethernet2 Library
WebSockets_Generic v2.14.1
[WS] Board :RASPBERRY_PI_PICO, setCsPin:17
[WS] Default SPI pinout:
[WS] MOSI:19
[WS] MISO:16
[WS] SCK:18
[WS] SS:17
[WS] =========================
WebSockets Client @ IP address: 192.168.2.104
Connecting to WebSockets Server @ IP address: 192.168.2.30, port: 8080
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"wLveTFXZ2aOedXwBAAAB"}
[WS] [WS-Client] connection lost.
[IOc] Disconnected  <========  Stop SIO server here
[IOc] Disconnected
["event_name",{"now":30001}]
<========>  Restart SIO server here
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080  <========  Auto reconnect to SIO server here
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"7SROSZISPiCGuTqyAAAB"}
[WS] [wsIOc] get ping send pong:3
[IOc] Get PING
[WS] [WS-Client] connection lost.    <========  Stop SIO server here
[IOc] Disconnected
[IOc] Disconnected
["event_name",{"now":60002}]
[IOc] Disconnected
[IOc] Disconnected
<========>  Restart SIO server here
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080  <========  Auto reconnect to SIO server here
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [WS-Client] Sending HB ping
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"bYK85irYVfqbavpuAAAB"}
["event_name",{"now":90003}]
[WS] [wsIOc] get event: len = 53
[WS] [wsIOc] get data: ["Send Message io.emit Broadcasted : ",{"now":90003}]
[IOc] Get event: ["Send Message io.emit Broadcasted : ",{"now":90003}]
[WS] [wsIOc] get event: len = 33
[WS] [wsIOc] get data: ["Send Message : ",{"now":90003}]
[IOc] Get event: ["Send Message : ",{"now":90003}]
...
  <========  Stop SIO server here a long time
["event_name",{"now":900030}]
[IOc] Disconnected
[IOc] Disconnected
<========>  Restart SIO server here
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080  <========  Auto reconnect to SIO server here
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [WS-Client] Sending HB ping
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"DQrehSBMwRGJa6IcAAAB"}
["event_name",{"now":930031}]
[WS] [wsIOc] get event: len = 54
[WS] [wsIOc] get data: ["Send Message io.emit Broadcasted : ",{"now":930031}]
[IOc] Get event: ["Send Message io.emit Broadcasted : ",{"now":930031}]
[WS] [wsIOc] get event: len = 34
[WS] [wsIOc] get data: ["Send Message : ",{"now":930031}]
[IOc] Get event: ["Send Message : ",{"now":930031}]
[WS] [wsIOc] get ping send pong:3
[IOc] Get PING
["event_name",{"now":960032}]
[WS] [wsIOc] get event: len = 54
[WS] [wsIOc] get data: ["Send Message io.emit Broadcasted : ",{"now":960032}]
[IOc] Get event: ["Send Message io.emit Broadcasted : ",{"now":960032}]
[WS] [wsIOc] get event: len = 34
[WS] [wsIOc] get data: ["Send Message : ",{"now":960032}]
[IOc] Get event: ["Send Message : ",{"now":960032}]
[WS] [wsIOc] get ping send pong:3
[IOc] Get PING
[WS] [WS-Client] Sending HB ping
["event_name",{"now":990033}]
[WS] [wsIOc] get event: len = 54
[WS] [wsIOc] get data: ["Send Message io.emit Broadcasted : ",{"now":990033}]
[IOc] Get event: ["Send Message io.emit Broadcasted : ",{"now":990033}]
[WS] [wsIOc] get event: len = 34
[WS] [wsIOc] get data: ["Send Message : ",{"now":990033}]
[IOc] Get event: ["Send Message : ",{"now":990033}]
[WS] [wsIOc] get ping send pong:3
[IOc] Get PING

SIO Server

kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh 
Server launched on port 8080
Connected
wLveTFXZ2aOedXwBAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
^C     <========  Stop SIO server here
kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh   <========>  Restart SIO server here
Server launched on port 8080
Connected
7SROSZISPiCGuTqyAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
^C     <========  Stop SIO server here
kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh   <========>  Restart SIO server here
Server launched on port 8080
Connected
bYK85irYVfqbavpuAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
Message from Client :  { now: 90003 }
Message from Client :  { now: 120004 }
Message from Client :  { now: 150005 }
Message from Client :  { now: 180006 }
Message from Client :  { now: 210007 }
Message from Client :  { now: 240008 }
Message from Client :  { now: 270009 }
Message from Client :  { now: 300010 }
Message from Client :  { now: 330011 }
^C     <========  Stop SIO server here
kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$    <========>  Restart SIO server here
...

The issue now is your Server and/or your code

I'm sorry I won't waste my time here anymore, unless you identify and prove there is a bug of the library.

Please spend more time to test and re-verify your code before posting anymore.

khoih-prog commented 2 years ago

The same test is OK with Nano_RP2040_Connect using mbed_nano core


Start WebSocketClientSocketIO_NINA on Nano RP2040 Connect
WebSockets_Generic v2.14.1
Used/default SPI pinout:
MOSI:11
MISO:12
SCK:13
SS:10
Attempting to connect to SSID: HueNet1
SSID: HueNet1
WebSockets Client IP Address: 192.168.2.98
signal strength (RSSI):-27 dBm
Connecting to WebSockets Server @ IP address: 192.168.2.30, port: 8080
[WS] [WS-Client] Connect ws...
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"YiSiOlovcEUeqrMBAAAB"}
<========  Stop SIO server here
[WS] [WS-Client] connection lost.
[IOc] Disconnected
[WS] [WS-Client] Connect ws...
[IOc] Disconnected
["event_name",{"now":30485}]
<========>  Restart SIO server here
[WS] [WS-Client] Connect ws...
[WS] [WS-Client][connectedCb] Connected to Host:192.168.2.30, Port:8080  <========  Auto reconnect to SIO server here
[WS] [wsIOc] Connected to url:/socket.io/?EIO=4
[IOc] Connected to url: /socket.io/?EIO=4
[WS] [wsIOc] get pong
[IOc] Get PONG
[WS] [wsIOc] connected: len = 30
[WS] [wsIOc] data: {"sid":"mH0l5RO7kHmMV26aAAAB"}
[WS] [WS-Client] Sending HB ping
["event_name",{"now":60486}]
[WS] [wsIOc] get event: len = 53
[WS] [wsIOc] get data: ["Send Message io.emit Broadcasted : ",{"now":60486}]
[IOc] Get event: ["Send Message io.emit Broadcasted : ",{"now":60486}]
[WS] [wsIOc] get event: len = 33
[WS] [wsIOc] get data: ["Send Message : ",{"now":60486}]
[IOc] Get event: ["Send Message : ",{"now":60486}]
[WS] [WS-Client] connection lost.
[IOc] Disconnected
[WS] [WS-Client] Connect ws...

SIO Server side

kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh 
Server launched on port 8080
Connected
DQrehSBMwRGJa6IcAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
Message from Client :  { now: 930031 }
Message from Client :  { now: 960032 }
Message from Client :  { now: 990033 }
Message from Client :  { now: 1020034 }
Message from Client :  { now: 1050035 }
Message from Client :  { now: 1080036 }
Message from Client :  { now: 1110037 }
Message from Client :  { now: 1140038 }
kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh 
Server launched on port 8080
Connected
YiSiOlovcEUeqrMBAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
^C     <========  Stop SIO server here
kh@kh-Inspiron-3593:~/Downloads/Nodejs/New_SIO$ ./SIO_server.sh     <========>  Restart SIO server here
Server launched on port 8080
Connected
mH0l5RO7kHmMV26aAAAB
JWT token test:  {
  host: '192.168.2.30:8080',
  connection: 'keep-alive',
  authorization: '1234567890',
  'user-agent': 'arduino-WebSocket-Client'
}
Message from Client :  { now: 60486 }
...
iKK001 commented 2 years ago

Two comments:

Comment 1.

If I test by simply turning off the network - then everything works as you describe in your logs...

But if I test with a Cloud-Run-Server-disconnect, then the inexplicable "freeze" behaviour of the RP2040 Connect with mbed core results... (Other people described the very same here)

That does not really make any sense to me. Maybe you have an explanation.

Comment 2.

Your logs do show a disconnect and new connect. The time this takes is approx. 10-14 seconds on an RP2040 Connect with mbed core.

If I do the same in iOS, Android or Pyton-on-a-Raspi-Py-Zero, then this takes approx 1 second for a re-connect. But this is probably again due to the already described issue that we need to "upgrade" from html to websocket and that takes time it seems to me.

For now, I can live with an extra cmd coming from the Server right at disconnect - so that I can do the Arduino re-connect manually. This is a workarround as long as we don't have a solution to this issue.