Closed dasl- closed 1 year ago
Super, thanks. I'll have a dig through...
I had a look at the logs and the very interesting gdb
backtrace, and on the face of it, it looks as if the crash was inside the Apple ALAC decoder.
So, let me suggest that you select the hammerton
decoder in the configuration settings and try again.
alac_decoder = "hammerton"; // This can be "hammerton" or "apple". This advanced setting allows you to choose
Interestingly, the ALAC decoder in FFmpeg
is by the same guy, Dave Hammerton, and is probably maintained better there. It would take a little while to incorporate in SPS.
Of course, this doesn't mean that the decoder is where the problem is, but it is suggestive. OTOH, the value of the outbuffer
looks odd, so maybe there is a memory allocation issue somewhere. I'll keep digging.
So, let me suggest that you select the hammerton decoder in the configuration settings and try again.
I can certainly do that, but I'm a little confused. My understanding was that I am already using the hammerton decoder by default. The logs say "player.c:169" Hammerton Decoder used on encrypted audio.
The sample config suggests hammerton will be chosen by default too:
% cat /etc/shairport-sync.conf.sample | grep -A2 alac_decoder
// alac_decoder = "hammerton"; // This can be "hammerton" or "apple". This advanced setting allows you to choose
// the original Shairport decoder by David Hammerton or the Apple Lossless Audio Codec (ALAC) decoder written by Apple.
// If you build Shairport Sync with the flag --with-apple-alac, the Apple ALAC decoder will be chosen by default.
I do not build SPS with the --with-apple-alac
flag. I might be misunderstanding though.
Apologies -- the confusion is mine 😕. You are right -- the crash is in the Hammerton decoder. Unfortunately, we know that it can crash on illegal input, so maybe I'll start looking at the FFmpeg
version, in case there are fixes.
But it's a bit uncool, not to say improper, to blame someone else's code without evidence. With that in mind, let me suggest a way to wring more information out of a backtrace: turn off compiler optimisations. This can be done by preceding the ./configure
step with a few definitions, like this:
$ CFLAGS="-O0 -g" CXXFLAGS="-O0 -g" ../configure
Then do a $ make clean
followed by a $ make
.
It would be most interesting to see the contents of the buffer when the crash occurred (it's at 0xa85f72f8
in the backtrace) and its length, which is 32 in this case. The buffer contains the packet to be decoded, and It's just possible that it is causing the crash, i.e. we have a smoking gun. Unfortunately it's been optimised out, so maybe turning off optimisation would help.
Meanwhile, I'll try some other diagnostics here. I've used valgrind
before, but not for an extended period; perhaps it will reveal something. That outbuffer
value does look a bit odd.
Apologies -- the confusion is mine 😕
No worries!
With that in mind, let me suggest a way to wring more information out of a backtrace: turn off compiler optimisations. This can be done by preceding the ./configure step with a few definitions, like this:
It looks like I am already turning off compiler optimizations. This is how I've been building SPS.
However, I had not been adding make clean
in my installation script. I have just added that step and will rebuild, hoping to eventually capture another segfault coredump.
This issue has been inactive for 45 days so will be closed 7 days from now. To prevent this, please remove the "stale" label or post a comment.
Updating this thread just note an interesting problem I had with using GDB on a recent coredump. I managed to capture a coredump of another shairport-sync segfault, but I was unable to show much of a backtrace in GDB: https://gist.github.com/dasl-/685be88dade7c208c139061c227f0eb7
GDB was complaining about stuff related to libthread_db
. As suggested in the GDB output, I tried adding a line to this file: /root/.gdbinit
. I created that file and added the following lines:
set auto-load safe-path /
set debug libthread-db 1
I tried running GDB again after making these changes, but I still was unable to get anything useful: https://gist.github.com/dasl-/594cc029028440f73708617f7742add4
It was still complaining about: warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
. I'm not sure how to resolve that.
I've added automatic restarting to my systemd configuration to work around the very rare segfaults :)
https://github.com/dasl-/pitools/commit/0022671fdfec49a3ab90549e193f82dbc45b01f0
I've just got 3 crashes in a few days after running 4.2-1-g12ad72c4-AirPlay2-smi9-OpenSSL-Avahi-ALSA-pa-pipe-soxr-metadata-mqtt-sysconfdir:/etc
on RPI4 (Raspbian 11) for a better part of the year without an issue.
Curiously first two crashes happened on the first tracks of act 2 & 3 of tchaikovsky's swan lake (Apple music on ipadOS 17.2), but I haven't been able to reproduce. The one I've got the coredump for happened when I was not around.
My backtrace is almost identical to the one here, but I barely found this report because the stack traces are on a separate gist. I'll paste mine here, so that it can be found via search.
If the crashes continue to happen regularly, I'll try the obvious steps, like upgrading and tweaking the optimization flags.
It would be most interesting to see the contents of the buffer when the crash occurred (it's at 0xa85f72f8 in the backtrace) and its length, which is 32 in this case. The buffer contains the packet to be decoded, and It's just possible that it is causing the crash, i.e. we have a smoking gun. Unfortunately it's been optimised out, so maybe turning off optimisation would help.
Not sure why you said it had been optimized out, but I tried to dump it anyway, see below.
Note that the crash is due to an invalid value of dest=0x1369650
.
$ journalctl -xu shairport-sync.service | grep SEGV
Feb 04 19:09:45 argon systemd[1]: shairport-sync.service: Main process exited, code=killed, status=11/SEGV
Feb 04 21:12:49 argon systemd[1]: shairport-sync.service: Main process exited, code=killed, status=11/SEGV
Feb 07 12:37:27 argon systemd[1]: shairport-sync.service: Main process exited, code=dumped, status=11/SEGV
$ sudo gdb /usr/local/bin/shairport-sync /tmp/core
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0003acb8 in deinterlace_16 (interlacing_leftweight=<optimized out>, interlacing_shift=<optimized out>, numsamples=352, numchannels=<optimized out>, buffer_out=0x9,
buffer_b=<optimized out>, buffer_a=<optimized out>) at alac.c:561
561 left = buffer_a[i];
[Current thread is 1 (Thread 0xe9afbc80 (LWP 29475))]
(gdb) bt
#0 0x0003acb8 in deinterlace_16
(interlacing_leftweight=<optimized out>, interlacing_shift=<optimized out>, numsamples=352, numchannels=<optimized out>, buffer_out=0x9, buffer_b=<optimized out>, buffer_a=<optimized out>) at alac.c:561
#1 alac_decode_frame (alac=<optimized out>, inbuffer=inbuffer@entry=0xe9af9540 " ", outbuffer=outbuffer@entry=0x1369650, outputsize=outputsize@entry=0xe9af8c2c) at alac.c:974
#2 0x00034f6c in unencrypted_packet_decode
(packet=packet@entry=0xe9af9540 " ", length=length@entry=32, dest=dest@entry=0x1369650, outsize=outsize@entry=0xe9af8c2c, size_limit=size_limit@entry=1408, conn=conn@entry=0x1170370)
at player.c:172
#3 0x00035014 in audio_packet_decode (dest=0x1369650, destlen=0xe9af94cc, destlen@entry=0xe9af94c4, buf=0xe9af9540 " ", buf@entry=0x0, len=len@entry=32, conn=0x1170370,
conn@entry=0xd3d62ace) at player.c:227
#4 0x00035b68 in player_put_packet (original_format=original_format@entry=1, seqno=<optimized out>, seqno@entry=14040, actual_timestamp=0, actual_timestamp@entry=3554028238, data=0x0,
data@entry=0xe9af9538 "\365\303\004", len=32, conn=conn@entry=0x1170370) at player.c:501
#5 0x0002f1dc in decipher_player_put_packet (ciphered_audio_alt=ciphered_audio_alt@entry=0xe9afa66a "p\356", nread=nread@entry=66, conn=0x1170370,
conn@entry=0x2f96c <rtp_realtime_audio_receiver+156>) at rtp.c:1707
#6 0x0002f96c in rtp_realtime_audio_receiver (arg=0x2f96c <rtp_realtime_audio_receiver+156>) at rtp.c:1935
#7 0xf5e4b310 in start_thread (arg=0xe9afbc80) at pthread_create.c:477
#8 0xf5d18da8 in () at ../sysdeps/unix/sysv/linux/arm/clone.S:73
(gdb) info registers
r0 0x1369652 20354642
r1 0xeb3306a8 3945989800
r2 0x1 1
r3 0xeb330120 3945988384
r4 0x4 4
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x4 4
r9 0x160 352
r10 0x0 0
r11 0x9 9
r12 0x0 0
sp 0xe9af8b18 0xe9af8b18
lr 0x0 0
pc 0x3acb8 0x3acb8 <alac_decode_frame+3904>
cpsr 0x20800010 545259536
fpscr 0x60000010 1610612752
(gdb) x/i $pc
=> 0x3acb8 <alac_decode_frame+3904>: strh lr, [r0, #-2]
(gdb) x 0x1369650
0x1369650: Cannot access memory at address 0x1369650
(gdb) x/32b 0xe9af9540
0xe9af9540: 0x20 0x00 0x00 0x04 0x00 0x13 0x08 0x0c
0xe9af9548: 0x2b 0xee 0x8e 0x0e 0x75 0xf8 0xf0 0x13
0xe9af9550: 0x08 0x09 0x1b 0xf4 0x98 0x09 0x0f 0xfc
0xe9af9558: 0x34 0xff 0x80 0xaf 0xbf 0xe0 0x2b 0xfc
Thanks for the post, and for the core dump. It appears that the crash occurred in the code that decodes an Apple Lossless Audio Codec (ALAC) packet. It's not impossible that some bad packet was sent, but it seems more likely to me that the data or the memory allocations were somehow corrupted. I think it's more likely because in 4.3.2
we have fixed quite a few bugs in the PulseAudio (pa
) backend that might have caused memory allocation and usage errors. So, definitely worth updating to 4.3.2
BTW there is also now a pretty robust PipeWire (pw
) backend which is recommended if you have a (recent) PipeWire system.
What happened?
shairport-sync got a segfault. At the time I was playing multiroom audio through 2 SPS receivers, and I was cycling through pressing pause + resume "in a loop" (manually) to stress test audio synchronization bugs. The client was an iphone using the "radio swiss classic" app.
logs
Here are the last 5000 log lines of the SPS receiver that got the segfault: https://gist.github.com/dasl-/3dba2f7557de5282cbcf083101fd3324
As a comparison, here are the last 5000 log lines of the second SPS receiver I was connected to at the time, which did not get a segfault: https://gist.github.com/dasl-/456ff4d11bbb7a9ddde891aef8e02a5d
coredump
Here is the gdb backtrace: https://gist.github.com/dasl-/7b750f40317eca040ec65c2b5c63fecd
Let me know if you want me to run any other GDB commands.
Relevant log output
Configuration Information.
Here is the configuration of the SPS receiver that had a segfault. Note that I removed my long
audio_backend_buffer_desired_length_in_seconds
configuration setting that I had in https://github.com/mikebrady/shairport-sync/issues/1565How did you install Shairport Sync?
Built from source
Check previous issues