Closed priitohlo closed 6 years ago
Thanks for the post. This is very unexpected behaviour. Offhand I can’t think of a way to catch it, and I’m not sure it gdb
can be used in this situation.
A possible help would be to set log verbosity to 2 and after it has crashed see if there is anything consistent about what is written in the log.
Is anyone else seeing this?
I'm seeing this in the end of the log with -vv:
prsm
waiting for a mutex, maximum expected time of 20000 microseconds exceeded "player.c:633".
debug_mutex_lock at "player.c:633" expected max wait: 0.020000000, actual wait: 0.020514457 sec.
waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.031411682 sec.
waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.030332886 sec.
waiting for a mutex, maximum expected time of 20000 microseconds exceeded "player.c:633".
debug_mutex_lock at "player.c:633" expected max wait: 0.020000000, actual wait: 0.020542113 sec.
waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.031989698 sec.
waiting for a mutex, maximum expected time of 20000 microseconds exceeded "player.c:633".
debug_mutex_lock at "player.c:633" expected max wait: 0.020000000, actual wait: 0.022577518 sec.
waiting for a mutex, maximum expected time of 20000 microseconds exceeded "player.c:633".
debug_mutex_lock at "player.c:633" expected max wait: 0.020000000, actual wait: 0.021945438 sec.
waiting for a mutex, maximum expected time of 20000 microseconds exceeded "player.c:633".
debug_mutex_lock at "player.c:633" expected max wait: 0.020000000, actual wait: 0.021029922 sec.
waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.030318564 sec.
waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.031000278 sec.
Segmentation fault
This seems to have been caused by simply disconnecting from ssh session.
Thanks for this. Using gdb
, the built-in debugger, might be helpful, but it could be tedious. But, if you're willing, here is what you could do instead of just using $ shairport-sync -vv
:
$ gdb --args shairport-sync -vv
... lots of stuff
(gdb) run
... lots more stuff
Play stuff to the service, and when you get a segmentation fault, enter:
(gdb) bt
bt
stands for backtrace
, and it will try to list the call stack, which might help track down the problem.
Here's the output:
Thread 11 "shairport-sync" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x728ff320 (LWP 12479)]
0x76977298 in ?? () from /lib/arm-linux-gnueabihf/libdbus-1.so.3
(gdb) bt
#0 0x76977298 in ?? () from /lib/arm-linux-gnueabihf/libdbus-1.so.3
#1 0x769775c8 in ?? () from /lib/arm-linux-gnueabihf/libdbus-1.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
That's weird. The configuration of Shairport Sync is:
3.2RC11-mbedTLS-Avahi-ALSA-soxr-metadata
from above, right? That means that it should have no dealings withlibdbus
. I'm going to have to double check that, as D-Bus is used if one optionally includes the native D-Bus or MPRIS interfaces. If this is indeed the case -- i.e. if, as I believe, there is no D-Bus involvement in your build of Shairport Sync (there shouldn't be) -- then it's a system-wide problem. First let me check. Also, if you can capture another seg fault that reproduces this, that would be interesting indeed.
No, I haven't built it with dbus. The configure args are as such: --with-alsa --with-avahi --with-ssl=mbedtls --with-soxr --with-metadata --with-apple-alac --with-systemd
I tried this again for a couple of times now and the gdb backtrace always points to the dbus libraries.
Just continuing, the Avahi subsystem does use D-Bus alright, but not Shairport Sync itself, in this configuration:
3.2RC11-mbedTLS-Avahi-ALSA-soxr-metadata-sysconfdir:/etc
Have you any unusual settings in the configuration file?
Here are my uncommented lines in the configuration
interpolation = "soxr"; ignore_volume_control = "yes"; alac_decoder = "apple"; allow_session_interruption = "yes"; log_verbosity = 1;
Okay, thanks. I'll reproduce them here, just not (yet) on a 3B+.
Running this configuration on a Pi Zero W, and it gets easily overwhelmed when soxr-based stuffing is needed, but it otherwise behaving itself. Hmm.
I'm also keeping the firmware and kernel up-to-date fairly frequently (currently at 4.14.49-v7+), perhaps there are some incompatibilities in there?
Are you using Raspbian or Raspbian Lite? Are you using the Pi's GUI? (All my experimentation is based on Raspbian Lite, no GUI.)
Raspbian Stretch Lite (no X).
Dunno about the possible incompatibilities. I'll get a 3B+ in the next few days. Meanwhile I'll update the Pi Zero W -- currently at 4.9.44+.
As a sidenote, I compiled Shairport Sync to use tinysvcmdns instead of Avahi and everything seems a lot more stable now.
Very interesting. Is it possible that your system -- your installation -- is just borked?
Doubtful -- fresh installation, only ever used for Shairport Sync. I have disabled the TV service, Bluetooth and the LEDs, however. That is the extent of deviation from default setup.
Fair enough. Just up to 4.14.49+ on the Pi Zero W BTW. Let's see what happens.
How hard would it be to make it completely default -- it's just a thought, as Bluetooth uses D-Bus?
Will take a few ticks, will try ASAP.
When I get a chance, I'll check the Avahi code in Shairport Sync. Most of it is there a long time, and I'm not sure how much error checking is going on -- it might take a little while though, and hasn't been a problem before...
I have now completely reinstalled everything from scratch with everything up to date and at their defaults (except for sound configuration to use the HiFiBerry card, local users, partition size, wireless networking) but the behaviour is exactly the same right down to the gdb backtrace.
Most peculiar!
Sorry for the rather obvious question, but have you used older versions of Shairport Sync? And if so, have they had this issue?
I have used versions since 3.2RC10, which had the same issue. I just got around submitting this only now.
Thanks again. I'll be digging in over the next few days; I might have a development version with enhanced debug messages for you to try.
Are you getting any shairport sync log entries relating to avahi or DACP?
I am, but mostly things that indicate success of setup and connection, for example:
Jun 19 17:24:03 muse shairport-sync[9686]: avahi: service 'E3FF49F13A4E@Muse' group is not yet committed.
Jun 19 17:24:03 muse shairport-sync[9686]: avahi: request to add "_raop._tcp" service without metadata
Jun 19 17:24:03 muse shairport-sync[9686]: avahi: service 'E3FF49F13A4E@Muse' group is registering.
Jun 19 17:24:03 muse shairport-sync[9686]: avahi: service 'E3FF49F13A4E@Muse' successfully added.
Jun 19 17:25:27 muse shairport-sync[9686]: DACP monitor successfully started
No errors anywhere.
Taking a careful look through the avahi code in Shairport Sync yesterday, I could only find two places where a return code wasn't being checked, and those places are almost never called. I'll push the code with the extra checks later today.
I expect to get a 3B+ in the next few days, but I don't have a HiFiBerry Digi+. So, my next question is this: if you run the audio into the built-in DAC, does the problem persist? If it does, then I guess there might be a chance I could reproduce the problem here...
Same behaviour with hifiberry disabled and built-in audio interface enabled.
Okay, so with a 3B+ driving the built-in DAC on the very latest development
version (with MQTT and with a slight improvement in the timed mutex lock code that pretty much removes those debug_mutex_lock at ...
messages) I'm afraid I'm not getting any misbehaviour.
Would you be able to run it with statistics enabled please? Just in case it shows up any particular network issues, like high levels of packet loss.
Jul 06 00:16:29 muse shairport-sync[3136]: -0.9, 0.0, 0.0, 1003, 0, 0, 0, 0, 6293, 263, 264
Jul 06 00:16:37 muse shairport-sync[3136]: -1.9, 192.6, 192.6, 2006, 0, 0, 0, 0, 6246, 263, 264
Jul 06 00:16:39 muse shairport-sync[3136]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7978.7, 1242.7, 36650.0.
Jul 06 00:16:45 muse shairport-sync[3136]: -2.0, 218.1, 218.1, 3009, 0, 0, 0, 0, 6059, 259, 264
Jul 06 00:16:53 muse shairport-sync[3136]: -2.0, 271.9, 271.9, 4012, 0, 0, 0, 0, 6062, 259, 264
Jul 06 00:16:59 muse shairport-sync[3136]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7981.7, 2123.5, 52222.0.
Jul 06 00:17:01 muse shairport-sync[3136]: -1.9, 181.3, 181.3, 5015, 0, 0, 0, 0, 6267, 263, 264
Jul 06 00:17:09 muse shairport-sync[3136]: -2.0, 223.8, 223.8, 6018, 0, 0, 0, 0, 6162, 263, 264
Jul 06 00:17:17 muse shairport-sync[3136]: -2.0, 246.4, 246.4, 7021, 0, 0, 0, 0, 5990, 261, 265
Jul 06 00:17:19 muse shairport-sync[3136]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7982.4, 1991.1, 59701.0.
Jul 06 00:17:25 muse shairport-sync[3136]: -2.0, 215.3, 215.3, 8024, 0, 0, 0, 0, 6125, 254, 264
Jul 06 00:17:33 muse shairport-sync[3136]: -2.0, 195.4, 195.4, 9027, 0, 0, 0, 0, 6094, 261, 264
Jul 06 00:17:39 muse shairport-sync[3136]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7981.7, 1223.6, 26680.0.
Jul 06 00:17:41 muse shairport-sync[3136]: -2.0, 271.9, 271.9, 10030, 0, 0, 0, 0, 6059, 258, 265
Jul 06 00:17:42 muse systemd[1]: shairport-sync.service: Main process exited, code=killed, status=11/SEGV
Jul 06 00:17:42 muse systemd[1]: shairport-sync.service: Unit entered failed state.
Jul 06 00:17:42 muse systemd[1]: shairport-sync.service: Failed with result 'signal'.
Doesn't seem to be much of anything interesting here. The behaviour is still the same -- as soon as I log out any user, the service segfaults, as seen here.
Thanks for the update. When you say “log out a user”, what do you mean?
As I noted in my initial report, the behaviour is observable when a user were to log out of the system (e.g. ending su session, terminating SSH connection, etc.). Very rarely will the segfault occur when playback were to be stopped.
Thanks again for the clarification. It is such a weird behaviour. I'll keep trying to provoke a crash, but no luck so far...
I only ever got this to happen once, but have recently fixed a memory-corruption bug -- see #722 -- that might have a bearing on this. Would you be kind enough to try the latest development
build -- 3.2.d67 or later?
I was seeing the same issue with 3.2RC12, particularly in the form of crashing when play session ends. Just installed 3.2.d67 and will try it out actively over the next few days.
Setup: Raspberry Pi 3 (non-B+), Raspbian Stretch, BeoCreate 4-Channel Amplifier as DAC.
Many thanks.
Seems to be a lot more stable now. I'll monitor for a few days and close if it stays that way.
I also have observed no crashes of any kind since installing 3.2.d67. It's been rock-solid.
Same here. Closing.
shairport-sync tends to randomly segfault when logging in/out via SSH, starting or ending playback. This behaviour is observable both when starting it via systemd or directly on terminal, however is not reproducible all the time.
Is there a way to compile the binary to produce a little more error messages beyond "Segmentation fault", as this is all that is printed on crash?
Environment in use: Raspberry Pi Model 3B+ HiFiBerry Digi+ (output via TOSLINK) 3.2RC11-mbedTLS-Avahi-ALSA-soxr-metadata Latest stable Raspbian