Open MaggieSalak opened 2 years ago
cc @danigian
Hi @MaggieSalak,
thanks for posting the issue.
The default TC_TIMEOUT in LBS is 60 sec - if that is the case why is the LBS waiting ~2min before reconnecting to discovery endpoint?
LBS is not waiting 2 minutes before reconnecting. In fact, in the first round, LBS directly reconnects to the data endpoint (MUXS). Only after a few of these reconnect attempts fail, LBS resorts back to the discovery endpoint (INFOS), and finally back to the CUPS endpoint (if configured).
In your case it seems like the two minutes come from lower level TCP timeouts (host probably not responding to SYN?):
2022-02-11 09:50:42.307 [AIO:ERRO] Recv failed: NET - Connection was reset by peer
2022-02-11 09:50:42.307 [AIO:DEBU] [3] WS connection shutdown...
2022-02-11 09:50:42.307 [TCE:VERB] Connection to MUXS closed in state 4
2022-02-11 09:50:42.307 [TCE:INFO] MUXS reconnect backoff 4s (retry 2)
Station received a server-side TCP connection reset at 09:50:42
. A retry is scheduled for 09:50:46
(4 seconds immediate retry back-off).
2022-02-11 09:52:57.010 [AIO:ERRO] [3] WS connect failed: NET - The connection to the given server / port failed
2022-02-11 09:52:57.010 [AIO:DEBU] [3] WS connection shutdown...
Station reports at 09:52:57
that the connection failed. These are about 2 minutes after the retry attempt. This looks to me that the remote host did not respond to the TCP SYN and the TCP keep alive timeout kicks in. Next attempt is scheduled for 10 seconds later, i.e. 09:03:07
:
2022-02-11 09:52:57.014 [TCE:INFO] INFOS reconnect backoff 10s (retry 1)
Finally, one minute after the INFOS reconnection attempt, station connects:
2022-02-11 09:54:11.899 [AIO:XDEB] [3] ws_connecting state=1
2022-02-11 09:54:11.899 [AIO:XDEB] [3] ws_connecting state=2
2022-02-11 09:54:11.900 [TCE:INFO] Connecting to INFOS: ws://192.168.86.33:5000
To get complete clarity, I propose you do a TCP packet scan. But from the logs it looks like you shutdown the target port around 09:50 and didn't restart it until around 09:54:11. That (together with a possible DROP firewall target instead of REJECT) could explain this behavior.
After reconnecting, why is the LBS dropping all messages?
Station drops the messages in your case because it had to go back to the discovery endpoint (INFOS). If the data endpoint retry (MUXS) had been successful, station would have played back all the buffered messages to the LNS. This is certainly something that can be improved. In the future, we are looking into making such behavior more configurable.
On a separate note: I see you are using version 2.0.5. Is there a chance you could move to version 2.0.6?
Hi @beitler, thanks for your answer. The behavior we are encountering is exactly what you described: the target port is being shutdown and the 2 minutes are kicking in. I was able to reproduce the issue with version 2.0.6 of the Basics Station.
Regarding the second part of your question, it would be great to configure the behavior of (not) dropping messages if we are going back to the discovery endpoint. In a scenario where there are two LoRa Network Servers for redundancy (one active, one passive) plus an external discovery service, it is important to have the possibility to switch to the other LNS without dropping potentially useful telemetry collected in the seconds/minutes for the switch.
Do you see any plan for making this happen in a future version of Basics Station? Thanks
Thank you for your inquiry.
Customers are encouraged to submit technical questions via our dedicated support portal at https://semtech.force.com/ldp/ldp_support.
We invite all users to visit the LoRa Developer Portal Forum at https://forum.lora-developers.semtech.com and to join the thriving LoRa development community!
We have noticed that when Network Server goes down and then back up after a certain downtime, LBS doesn't seem to handle the situation as expected. Below are the logs from LBS we collected. The scenario is as follows:
09:50:42 The Network Server goes down For approximately 2 min the LBS is trying to connect to the data endpoint. 09:52:57 we see the error
Muxs connect failed - URI: ws://192.168.86.33:5000/router-data/B827EBFFFED381B7
After that the LBS is trying to connect to the discovery endpoint. After successfully connecting, all upstream messages that were sent in the meantime are lost.Could you please provide some insights on the following concerns: