sm0svx / svxlink

Advanced repeater system software with EchoLink support for Linux including a GUI, Qtel - the Qt EchoLink client
http://svxlink.org/
Other
435 stars 173 forks source link

Suddenly stops speaking to the EchoLink directory server #87

Closed sm0svx closed 9 years ago

sm0svx commented 10 years ago

The EchoLink module can suddenly stop speaking to the EchoLink directory server which make the node disappear from the EchoLink network. It never recovers so SvxLink have to be restarted.

This was first noticed after the EchoLink Proxy protocol was implemented. I also saw it the first time after I had switch hardware, to a Cubieboard (ARM).

reiser4 commented 9 years ago

hi, I am experiencing this issue. How can I help to debug it?

sm0svx commented 9 years ago

Well, the problem is that it's hard to debug since it happens very seldom. It also seem to happen more often in periods and then the problem can be gone for a while.

The problem probably is somewhere in the classes EchoLink::DirectoryCon or EchoLink::Directory. Not much help I'm afraid.

reiser4 commented 9 years ago

Hi, on my system I experience this every few days. Can I provide some debug information useful to fix this?

f5vmr commented 9 years ago

For the first time it has occurred also on our repeater during the night at 01:07 local. It continued to fail to connect until I rebooted the repeater at 08:50, when it connected without problem. There was no activity on the air at the time to indicate the cause. The svxlink.log shows nothing except the failure to communicate to the EchoLink Server.

sm0svx commented 9 years ago

Right now I cannot provide any easy debug instructions.

One way that would probably be a way to work around the problem is to run an Echolink proxy server on the same computer as SvxLink. I have not verified this though. Have a look at the Echolink proxy pages for installation instructions.

sm0svx commented 9 years ago

I have fixed a number of bugs in code that may affect the EchoLink code. Please test the latest code from git master to see if this issue is still there.

sm0svx commented 9 years ago

Nope. The latest bugfixes did not take care of the problem :-/

sm0svx commented 9 years ago

I think this should be fixed now. Can anyone verify it? The fix is both in the master and the maint branch. The maint branch corresponds to the 14.08 release with this bugfix applied.

brettwi commented 9 years ago

I for one would like to try this. Can you supply info as to how to download either software in maint or master? Is there a difference?

-Brett Williams

Date: Thu, 2 Apr 2015 09:49:50 -0700 From: notifications@github.com To: svxlink@noreply.github.com Subject: Re: [svxlink] Suddenly stops speaking to the EchoLink directory server (#87)

I think this should be fixed now. Can anyone verify it? The fix is both in the master and the maint branch. The maint branch corresponds to the 14.08 release with this bugfix applied.

— Reply to this email directly or view it on GitHub.

sm0svx commented 9 years ago

Master is the bleeding edge version with all the latest bells and whistles. It should usually be pretty stable but things may change without notice and things may break. Maint contains the latest released code.

The easiest way to get the code is to use "git". Install git if not already installed and then run the following command to download the source code:

git clone https://github.com/sm0svx/svxlink

The code will be in the svxlink subdirectory. It will be on the master branch by default. If you'd rather want the maint branch, run the following command:

git checkout maint

Use the same command if you want to switch back to master:

git checkout master

Now compile and install as usual.

To later update to the latest version:

git pull

It is also possible to download the respective branches as a zip-archive but I find that more clumsy to handle:

If you go this way, for example use the "unzip" utility to unzip the archive.

sm0svx commented 9 years ago

Has anybody seen the problem after the fix was made (Thu Apr 2 18:33:30 2015 +0200)?

sm0svx commented 9 years ago

Nobody seem to have seen this problem for a while now so I'll close the issue now. Reopen it if you see it again.

dl1com commented 2 years ago

Hi, I had a look through the issues and although the error message is never mentioned, I am pretty confident that the error described by OP is the same I am experiencing. Also, it is mentioned in https://sourceforge.net/p/svxlink/mailman/message/35117421/ and https://forum.funk-telegramm.de/thread/653-fehler-error-command-timeout-while-communicating/ but unfortunately no solution is given.

I am running svxlink 1.7.0 with Echolink 1.5.0. (19.09.2 the latest release from Github). It is using an Echolink Proxy from http://www.echolink.org/proxylist.jsp, Echolink SERVERS is set to SERVERS=europe.echolink.org servers.echolink.org. The router is set up to forward 5198 and 5199 UDP and 5200 TCP to the server.

From time to time (sometimes 1h after start of svxlink, sometimes up to 4h), svxlink obviously can't communicate to the directory server any more ("*** ERROR: Command timeout while communicating to the directory server") and does not recover from this. The echolink server is then shown as offline in the directory. Though, it can be fixed by restarting svxlink. The problem also appears when using another proxy. (Is the directory server connection related to the proxy or is it independent from it?)

Fri Nov 19 04:40:00 2021: Playing short voice ID
Fri Nov 19 04:40:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 04:40:04 2021: Tx1: Turning the transmitter OFF
Fri Nov 19 04:46:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 04:50:00 2021: RepeaterLogic: Sending short identification...
Fri Nov 19 04:50:00 2021: Playing short voice ID
Fri Nov 19 04:50:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 04:50:04 2021: Tx1: Turning the transmitter OFF
Fri Nov 19 04:51:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 04:53:15 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 04:56:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:00:00 2021: RepeaterLogic: Sending short identification...
Fri Nov 19 05:00:00 2021: Playing short voice ID
Fri Nov 19 05:00:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 05:00:04 2021: Tx1: Turning the transmitter OFF
Fri Nov 19 05:01:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:06:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:10:00 2021: RepeaterLogic: Sending short identification...
Fri Nov 19 05:10:00 2021: Playing short voice ID
Fri Nov 19 05:10:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 05:10:04 2021: Tx1: Turning the transmitter OFF
Fri Nov 19 05:11:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:16:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:20:00 2021: RepeaterLogic: Sending short identification...
Fri Nov 19 05:20:00 2021: Playing short voice ID
Fri Nov 19 05:20:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 05:20:04 2021: Tx1: Turning the transmitter OFF
Fri Nov 19 05:21:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:26:07 2021: *** ERROR: Command timeout while communicating to the directory server
Fri Nov 19 05:30:00 2021: RepeaterLogic: Sending short identification...
Fri Nov 19 05:30:00 2021: Playing short voice ID
Fri Nov 19 05:30:00 2021: Tx1: Turning the transmitter ON
Fri Nov 19 05:30:04 2021: Tx1: Turning the transmitter OFF
...

Please let me know if I can provide any further data, logs, pcap traces, ...

dl1hrc commented 2 years ago

Hi, I had this problem observed at my node that's connected by LTE connection. I guess the reason was an unreliable connection with a slow data rate. Maybe that the proxy and/or EchoLink directory server has a problem with constantly changing ip addresses. But that's just a guess. However, it is difficult to understand with the old version you are using. I would recommend an update to the latest trunk version. mni 73s de Adi / DL1HRC

dl1com commented 2 years ago

Hi @dl1hrc, thanks a lot for your quick reply!

Hi, I had this problem observed at my node that's connected by LTE connection. I guess the reason was an unreliable connection with a slow data rate. Maybe that the proxy and/or EchoLink directory server has a problem with constantly changing ip addresses. But that's just a guess.

It's a DSL connection and several other services are running on the network pretty fine, so I would probably rule this out.

However, it is difficult to understand with the old version you are using. I would recommend an update to the latest trunk version.

Good point. I was hesitating to use the master branch, as in many projects it's not really sure how stable and regression-free it is. So the latest official release is usually the way to got for me, which is 19.09.2 for svxlink. But as there have been about 240 commits since 19.09.2, I'll give the HEAD a go and report how it performs.

Thanks a lot!

dl1com commented 2 years ago

So, the day before yesterday I installed svxlink 1.7.99.53 (Echolink 1.5.99.2) - the current HEAD of master. The problems still occur. I installed a workaround by continuously parsing the logfile and restarting the svxlink service when the "*** ERROR..." line occurs. But I would really like to understand and fix the root cause of the problem.

@dl1hrc Question: Does the communication to the directory server happen also over the Echolink proxy or is it happening directly? Which ports are used for this communication? So I can setup a filtered PCAP trace and hope I can gather some more info about the problem. As it is not happening to that many people obviously, I want to rule out any network issues for now.

Thank you

dl1hrc commented 2 years ago

Afaik for the communication between SvxLink-Node and el-proxy the 8100/TCP is used. The el-proxy uses 5198+5199/UDP and 5200/TCP to communicate to the EchoLink-directory server. A proxy is needed for LTE-connections and if you can't forward the ports at the dsl-router or if you get a non-public ip-address from your internet provider (e.g. range 100.x.x.x, 10.x.x.x).

Just seen at my two nodes that the connection over my both el-proxy is working well. I've setup my own el-proxy instances since I've encountered problems with the public el-proxies.

73s Adi / DL1HRC