Closed f5vmr closed 6 years ago
Could it be that the svxlink process do not have persmission to write to the log file?
Or maybe disk full, read-only filesystem or something like that?
No, thought of those. I have applied full access right to the folder chain, and the sdcard is 16 GB with 8 Gb free. I have done nothing different to a Jessie install.
I have also tried sudo svxlink --daemon --logfile=svxlink.log - the log is written up to the point of declaring connection to the aprs server, then halts. (I was following by tail -f), and a sudo killall svxlink again reports no process.
Did a quick test here and I cannot get the same behavior. Does it do the same thing if you run it in the foreground, without --daemon but with --logfile? If that give the same problem, try:
$ gdb /path/to/svxlink
gdb> run --logfile /path/to/logfile
--- possible crash ---
gdb> bt
If you do not have gdb installed you will have to install it.
It is unclear which version you use. The title say 17.12.1 but in the text you say "latest master". Which one is it?
It was yesterday's version, but not sure if I beat you to your messages of yesterday or not. I'll load gdb tomorrow and give it a whirl. Thanks Tobias.
Mornig Tobias - gdb installed as standard in Stretch - result below (gdb) run --logfile /var/log/svxlink.log Starting program: /usr/bin/svxlink --logfile /var/log/svxlink.log [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xb6640450 (LWP 1584)] [Thread 0xb6640450 (LWP 1584) exited]
Thread 1 "svxlink" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
from /usr/lib/arm-linux-gnueabihf/libtcl8.5.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)
Could this be an operating system error in Stretch or one of its packages?
Running gdb without a log process
Starting program: /usr/bin/svxlink [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". SvxLink v1.6.99.5 Copyright (C) 2003-2017 Tobias Blomberg / SM0SVX
SvxLink comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it in accordance with the terms and conditions in the GNU GPL (General Public License) version 2 or later.
Using configuration file: /etc/svxlink/svxlink.conf --- Using sample rate 48000Hz [New Thread 0xb6640450 (LWP 1877)]
Starting logic: RepeaterLogic Loading RX: Rx1 [Thread 0xb6640450 (LWP 1877) exited] Loading TX: Tx1 RepeaterLogic: Event handler script successfully loaded. Connected to APRS server 62.77.224.245 on port 14580 RepeaterLogic: Sending short identification... Tx1: Turning the transmitter ON Tx1: Turning the transmitter OFF [Inferior 1 (process 1862) exited normally] (gdb) terminated with a q command as the repeater was operating normally.
I am left puzzled by this situation. I have built a second card with the absolutely identical set up except for changing the boot up sequence for terminal logon as opposed to desktop configuration. The error I pointed out as above does not occur in this configuration.
I suppressed the desktop in the original installation and rebooted, but the svxlink still fails when writing to a log, always at the same point. When run in direct mode i.e. sudo svxlink --logfile=/var/log/svxlink.log the log still gets written up to and including connection to the aprs server, then the svxlink terminates with "Aborted" in a new line before displaying a fresh prompt.
Sounds like something is wrong with your other card. Should not have anything to do with desktop. The easiest way is probably to reinstall the card. Have you considered using the package based installation method?
Hello Tobias,
I have same problems with this version of SVXLink. Chris describe vary well what is problem.
cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)" NAME="Raspbian GNU/Linux" VERSION_ID="9" VERSION="9 (stretch)" ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
I try run this gdb and get this lines:
----------------------[Start test]---------------------- gdb /usr/bin/svxlink (gdb) run Starting program: /usr/bin/svxlink [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". SvxLink v1.6.99.8 Copyright (C) 2003-2017 Tobias Blomberg / SM0SVX
SvxLink comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it in accordance with the terms and conditions in the GNU GPL (General Public License) version 2 or later.
Using configuration file: /etc/svxlink/svxlink.conf --- Using sample rate 48000Hz [New Thread 0x76414450 (LWP 840)] [Thread 0x76414450 (LWP 840) exited]
Starting logic: SimplexLogic Loading RX: Rx1 Loading TX: Tx1 Loading module "ModuleHelp" into logic "SimplexLogic" Found /usr/lib/arm-linux-gnueabihf/svxlink/ModuleHelp.so Module Help v1.0.0 starting... Loading module "ModuleEchoLink" into logic "SimplexLogic" Found /usr/lib/arm-linux-gnueabihf/svxlink/ModuleEchoLink.so Module EchoLink v1.4.99.0 starting... [New Thread 0x759d7450 (LWP 841)] Loading module "ModuleTclVoiceMail" into logic "SimplexLogic" [Thread 0x759d7450 (LWP 841) exited] Found /usr/lib/arm-linux-gnueabihf/svxlink/ModuleTcl.so Module Tcl v1.0.1 starting... Loading module "ModuleMetarInfo" into logic "SimplexLogic" Found /usr/lib/arm-linux-gnueabihf/svxlink/ModuleMetarInfo.so Module MetarInfo v1.1.0 starting... Loading module "ModuleParrot" into logic "SimplexLogic" Found /usr/lib/arm-linux-gnueabihf/svxlink/ModuleParrot.so Module Parrot v1.1.1 starting... SimplexLogic: Event handler script successfully loaded. Connected to APRS server 109.72.122.37 on port 14580 EchoLink directory status changed to ON [New Thread 0x759d7450 (LWP 842)] [Thread 0x759d7450 (LWP 842) exited] --- EchoLink directory server message: --- EchoLink Server v2.5.9997
ECHOEC2-1: Herndon, VA USA
[Inferior 1 (process 837) exited normally]
----------------------[END]----------------------
And second line with --logfile
----------------------[START test --logfile]---------------------- gdb /usr/bin/svxlink (gdb) run --logfile /home/pi/svxlink.log Starting program: /usr/bin/svxlink --logfile /home/pi/svxlink.log [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0x76414450 (LWP 844)] [Thread 0x76414450 (LWP 844) exited] [New Thread 0x759d7450 (LWP 845)] [Thread 0x759d7450 (LWP 845) exited] [New Thread 0x759d7450 (LWP 846)] [Thread 0x759d7450 (LWP 846) exited]
Thread 1 "svxlink" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
----------------------[END]----------------------
Satisfied that the 8gb card works according to plan, I reformatted the 16gb card, and rebuilt it with the November version of Stretch instead of the updated and upgraded august version I had saved. I then went through the build and git clone of 17.12.1.154 again. On completion I am sorry to report the same results as I reported earlier with gdb. I see this error appears in applications other than svxlink. What I will do tomorrow however is run the expand OS on the raspi-config, although Raspbian stretch is supposed to do it on boot up. However df reports a full size card in use in the table.
One more report. Same thing
cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 8 (jessie)" NAME="Raspbian GNU/Linux" VERSION_ID="8" VERSION="8 (jessie)" ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
Starting program: /usr/bin/svxlink --logfile=/var/log/svxlink.log [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0x764ca450 (LWP 4732)] [Thread 0x764ca450 (LWP 4732) exited] [New Thread 0x755bb450 (LWP 4733)] [New Thread 0x74d2e450 (LWP 4734)] [Thread 0x74d2e450 (LWP 4734) exited] [New Thread 0x74d2e450 (LWP 4735)] [Thread 0x74d2e450 (LWP 4735) exited]
Program received signal SIGABRT, Aborted. 0x7686ff70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
That is a Jessie install! I didn't have that problem with raspbian Jessie, but then I hadn't installed 17.12.1.154 until Raspbian Stretch.
Look my first post I add os-release. All is make from today updated svxlink folder via GIT. Only on Jessie I have previous install svxlink 15.11 and then updated to newer version.
Ok - Something is very odd - new card - new install of Raspbian Stretch, with package based install this time, and the same result except for the uprated package libtcl8.6 instead of 8.5. (gdb) run --logfile=/var/log/svxlink.log Starting program: /usr/bin/svxlink --logfile=/var/log/svxlink.log [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xb6591450 (LWP 1449)] [Thread 0xb6591450 (LWP 1449) exited]
Thread 1 "svxlink" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
from /usr/lib/arm-linux-gnueabihf/libtcl8.6.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)
https://stackoverflow.com/questions/48278881/gdb-complaining-about-missing-raise but it solves nothing.
Well that was a problem overcome. I found that the dirty bit was set, which in my case indicated a formatting error, probably causing a stack overflow error. I flattened the card, reformatting it and rebuilt it. I then tackled the Package Install again, and it works without the error. In view that others have seen the same problem, I recommend they review their formatting and imaging procedure with fsck. I consider that unless anyone else reports an issue I consider this problem closed. Chris
I though it was too good to be true. Initially the system was working. After setting up for a full repeater connection, and tail-ing the log, I noticed that the log was displaying that there was a constant restart of the svxlink system. then I run the gdb again and exactly the same fault had developed. I removed the card to a linux laptop and found the dirty bit had been set again. I ran fsck and reset it, and returned the sdcard to the raspberry. This time this fault is show, and the Rx1 is not recognised. (gdb) run --logfile=/var/log/svxlink.log Starting program: /usr/bin/svxlink --logfile=/var/log/svxlink.log [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xb653d450 (LWP 6064)] [Thread 0xb653d450 (LWP 6064) exited] [Inferior 1 (process 6063) exited with code 01] (gdb) bt No stack. (gdb)
Stopped the run, rebooted and re-run under gdb a second time. (gdb) run --logfile=/var/log/svxlink.test2 Starting program: /usr/bin/svxlink --logfile=/var/log/svxlink.test2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xb653d450 (LWP 1017)] [Thread 0xb653d450 (LWP 1017) exited]
Thread 1 "svxlink" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) I am really beginning to think the card type is at fault. This was a SanDisk Ultra 16 Gb HC I.
Yes it's a really strange behaviour and if the exact same installation work on other cards I'd say it's a hardware problem.
I can confirm this on a "normal" server, no hardware problem with sd cards or something like this. SvxLink is running as root, so it can not be a right issue. It terminates exactly every full minute when started with "--logfile", error:
Program received signal SIGABRT, Aborted. 0x00007ffff4f7d0a7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden. (gdb)
Very strange indeed. It seems to have something with TCL to do but I cannot understand what's happening, and why it's related to the logfile. Hmmm...
I removed all own tcl environment, deleted all specific *.so libraries and reinstalled it. It occurs on these systems: 1.6.99.0, 1.6.99.2 and 1.6.99.9 with different soft-/hardware (bananian + bananapi and opensuse + x86_64)
I've just built another card for F5ZGM-R (as the current installation has gone READ-ONLY for some reason,) and this new install has also got the same issue when running --logfile= but runs normally without a log. Again it crashed without --daemon and with --logfile=. With --daemon and no logfile it runs perfectly. So is it a coding issue in associated software? Incidentally the current F5ZGM-R does not have this issue as I haven't upgraded it to Raspbian Stretch, but I noticed that in the root a file dead.letter of 7 Gb size has appeared - an entirely different issue I'm sure. F5ZGM-R does have --logfile= and sends me a daily log by email. I am now attempting a 4 Gb card with Stretch-Lite to see if that generates a problem with svxlink.
Report: 4Gb MicroSD Card, Raspbian Stretch-Lite fully updated, darkice installed with the package based install of svxlink, the log creation works, and everything appears to work to normal parameters. In case anyone thinks the 4 Gb card may be a bit small.....
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 3741504 1212136 2344664 35% /
devtmpfs 217972 0 217972 0% /dev
tmpfs 222264 0 222264 0% /dev/shm
tmpfs 222264 3172 219092 2% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 222264 0 222264 0% /sys/fs/cgroup
/dev/mmcblk0p1 41853 21328 20525 51% /boot
tmpfs 44452 0 44452 0% /run/user/1000
Have installed the change described in #373 on two of my systems, it seems to work ;)
Could be the same problem as issue #373 and the TCL crash bug described in http://svxlink.996268.n3.nabble.com/Unexpected-Segfault-Tcl-RegisterChannel-names-td1443.html It is triggered by calling close(notifier_rd); in destructor ~CppDnsLookupWorker() and only happens when a log file is specified and the console is detached. The "deamon" option has no influence. I'd suppose that the STDIN descriptor 0 is re-used in TCL_RegisterChannel and maybe someone is checking a function (e.g. open) return value for beeing > 0. The problem re-appeared after the introduction of deferred resource release using the runTask initiated task timer driven delete/destructor calls. The problem seems to be solved when inserting the line dup2(open("/dev/null", O_RDONLY), STDIN_FILENO); after close(STDIN_FILENO); in the main function in svxlink.cpp. This keeps the STDIN descriptor allocated. Further tests are still pending...
close(STDOUT_FILENO), close(STDERR_FILENO), close(STDIN_FILENO) can be removed since dup2 implements an atomic close/re-open mechanism (see dup2 man page)
Interesting - I have recompiled my intended replacement logic with the dup2(ope..etc) line in svxlink.cpp, and it is a solution. The logic doesn't fail and the log writes as it should. I am not enough experienced in C++ to follow the logic, but I cannot see that any changes have been done to this segment of code by Tobias for at least several months. Perhaps the code is referenced by one of the newer changes in the SvxReflector code that may incorporate a global variable that causes the failure?
I don't use the svxlink code directly from GIT. Instead, I'm maintaining a dedicated low latency audio streaming variant and take over the GIT code by cherry picking. This also helps to isolate potential problems, since the new code undergoes an additional review and test. As I did not have taken over the SvxReflector code from GIT, it is definitely not involved in the observed behaviour. But because first signs of this problem seem to date back to at least 2011, the cause search space should not be limited to recent changes.
That was some very good bughunting, @dh1dm! I don't think that the non-atomic close is a problem since this is done very early in the application but it's clearly unnecessary and conceptually wrong so it should be removed. On the other hand that closing of stdin was very sneaky and I guess that only bad things can come if someone else is assigned that filedescriptor randomly.
Please test the fix in latest master to see if that is the correct solution.
Thanks to everyone who helped solving this!
I concur. Well done! Tobias, F5ZGM-R is today in service with the latest master, with the svxlink.cpp single line modification. I installed it this morning. At the same time I offline the 50 MHz repeater for essential maintenance, and I will be doing the same to its logic. Regards to all, Chris F5VMR
Now working fine on my repeater too. Thanks.
Ok, that's good news! @s58DB, are you running lastest master without the one-liner fix? It's important since the fix in master is not exactly the same as the original fix. It's the same solution but written with some additional error handling.
@sm0svx I am just update svxlink folder with git pull and compile all again.
Ok, thanks! I then consider this bug solved.
Hello. I have a problem on my raspberry, using svxlink tbd in a conference room. After some time of operation, this message appeared in the startup of the machine and I cannot connect the conference in the system. Error 100005 appears. Can someone help me? Is it some log file that has reached its maximum capacity?
Latest master running on a raspberry Pi B+ on Raspbian Stretch works perfectly when run from the terminal command displaying the journal, and works equally well with --daemon, but the addition of --logfile=/var/log/svxlink.log causes a cessation of work after connecting to aprs/echolink servers. No errors reported, but a sudo killall svxlink reports that no process is running. Danilo Bozic reported this initially but I now confirm the problem exists.