Closed a-x- closed 3 years ago
shairport-sync -V 3.1.7-OpenSSL-Avahi-ALSA-soxr-metadata-sysconfdir:/etc
I'm not sure how to show systemd logs
journalctl
No journal files were found.
I reconfigure it right now: /etc/systemd/journald.conf:
[Journal]
Storage=persistent
Thans for the post, but it's rather hard to work out what the issue is. What is 1.1.1-23-gd65b8e8
?
BTW, you need to be privileged to read the journal, and you don't need to alter its setting. Try, e.g:
$sudo journalctl…
@a-x- :
Try journalctl --system
or sudo journalctl --system
.
The output of systemd-analyze blame --no-pager --system
would be interesting as well.
I also noticed this on RaspberryPi, I'm on latest commit:
commit d2713a40a7833cf1ef74f6691561fc8c77534e07
Author: Mike Brady <mikebrady@eircom.net>
Date: Sat May 19 09:39:25 2018 +0100
Also, when it works, works great, but lot of times, streaming from iPhone produces no sound and also forked-daapd cannot stream (checking the speaker immediately unchecks it).
The only solution is to restart the whole device since sudo systemctl restart shairport-sync
hangs......
May 20 12:35:05 ela systemd[1]: shairport-sync.service: State 'stop-sigterm' timed out. Killing.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Killing process 477 (shairport-sync) with signal SIGKILL.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Main process exited, code=killed, status=9/KILL
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Unit entered failed state.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Failed with result 'timeout'.
Thanks for the report. Would you be kind enough to set the log_verbosity
to 2 in the settings file and try to capture a log while this problem is occurring? That should make it easier to track down. Also, if you could say what kind of Pi and what versions of the OS, if would be very helpful.
[Update] Also, the output of $ systemd-analyze blame --no-pager --system
, as suggested by @Alphakilo, would be useful too! Thanks.
Linux living-room 4.14.34-v7+ #1110 SMP Mon Apr 16 15:18:51 BST 2018 armv7l GNU/Linux
2018-04-18-raspbian-stretch-lite
Both RPi 3B and 3B+ instances
I set log level to 2 on one of them, since then I could restart shairport-sync (!!)... so no report from this one, it just happened on another, couldn't restart again, I then set verbosity to 2 and rebooted the machine (it also took some time)...
I think shairport gets stuck probably somehow because I noticed the issue again by trying to set the output of forked-daapd to this instance and it immediately turned off..
Something is not the best with shairport besides this restarting issue, this is probably connected...
So, again, I noticed: cannot send output from forked-daapd, cannot play from iPhone (seems to be playing and the song progresses, but nothing plays).
Report after reboot:
$ systemd-analyze blame --no-pager --system
6.907s dhcpcd.service
4.653s hciuart.service
3.101s apt-daily.service
2.769s dev-mmcblk0p2.device
1.271s raspi-config.service
769ms networking.service
676ms lighttpd.service
659ms keyboard-setup.service
482ms ssh.service
433ms dphys-swapfile.service
338ms ntp.service
313ms systemd-fsck@dev-disk-by\x2dpartuuid-c7cb7e34\x2d01.service
301ms systemd-udev-trigger.service
265ms systemd-fsck-root.service
234ms systemd-logind.service
234ms systemd-udevd.service
198ms systemd-tmpfiles-setup-dev.service
191ms rsyslog.service
177ms fake-hwclock.service
163ms systemd-journald.service
149ms avahi-daemon.service
148ms plymouth-start.service
129ms sys-kernel-debug.mount
125ms bluetooth.service
111ms systemd-remount-fs.service
108ms dev-mqueue.mount
100ms run-rpc_pipefs.mount
94ms wifi-country.service
88ms systemd-tmpfiles-setup.service
82ms systemd-sysctl.service
79ms kmod-static-nodes.service
70ms triggerhappy.service
69ms systemd-update-utmp.service
69ms systemd-modules-load.service
69ms systemd-journal-flush.service
67ms alsa-restore.service
63ms rc-local.service
57ms nfs-config.service
55ms plymouth-read-write.service
53ms console-setup.service
47ms systemd-user-sessions.service
47ms sys-kernel-config.mount
39ms systemd-rfkill.service
37ms systemd-random-seed.service
35ms plymouth-quit.service
32ms systemd-update-utmp-runlevel.service
31ms plymouth-quit-wait.service
22ms boot.mount
Thanks for this. All that seems fine. I have it running on a 3B here on Stretch Lite 4.14.41-v7+ #1113 without problems, so I wonder, what other applications do you have running on the Pi?
Something that might be worth trying is to delay automatic startup for a few seconds -- see here in the TROUBLESHOTING.md page.
Is it possible to get the output of $ sudo systemctl status shairport-sync
?
You are right, I forgot to turn off wifi power management, after 1 day of testing with power management turned off, for now no issues with shairtport-sync
— Thank you!
I have an update for your document (I can make a pull request if you want), so this:
Alternatively, (also for the Raspberry Pi), add the following line:
wireless-power off
to the file /etc/network/interfaces.
was never working for me, tried again, power management=on after reboot again (so it didn't work)...
What works:
sudo nano /etc/network/if-up.d/off-power-manager
Type:
#!/bin/sh
/sbin/iwconfig wlan0 power off
sudo chmod +x /etc/network/if-up.d/off-power-manager
UPDATE:
The problem still present but much less frequently... I managed to confuse shairport by trying to rewing a track streaming from iPhone Music app directly to shairport instance.. it became unresponsive, I sshed into pi, tried to stop the service, it hung... then I rebooted (it also took a minute or two). Power management is off, unfortunately logging was disabled on this instance.. I enabled it, waiting for this to happen again, then I'll send the logs
UPDATE:
Problem is present with forked-daapd source as well (so not just streaming from iPhone)
I managed to capture the logs:
May 27 20:20:28 lab shairport-sync[494]: Already playing.
May 27 20:20:28 lab shairport-sync[494]: Connection 13: RTSP thread terminated.
May 27 20:31:58 lab shairport-sync[494]: New RTSP connection from 192.168.0.10:36868 to self at 192.168.0.92:5000 on conversation thread 14.
May 27 20:31:59 lab shairport-sync[494]: Connection 14: ANNOUNCE
May 27 20:31:59 lab shairport-sync[494]: RTSP Conversation thread 8 already playing when asked by thread 14.
May 27 20:31:59 lab shairport-sync[494]: ANNOUNCE failed to get the player
May 27 20:31:59 lab shairport-sync[494]: Already playing.
May 27 20:31:59 lab shairport-sync[494]: Connection 14: RTSP thread terminated.
I don't think "Already playing." part is true... because nothing was playing for a long time. Maybe what is different with my setup is that I have three different forked-daapd libraries as source and I play from different ones at different times... of course I have to stop the play from some forked-daapd source before starting from another.
After service stop issued:
May 27 20:36:32 lab shairport-sync[494]: Request to shut down all rtsp conversation threads
May 27 20:36:32 lab shairport-sync[494]: asking playing threads to stop
May 27 20:38:02 lab systemd[1]: shairport-sync.service: State 'stop-sigterm' timed out. Killing.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Killing process 494 (shairport-sync) with signal SIGKILL.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Main process exited, code=killed, status=9/KILL
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Unit entered failed state.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Failed with result 'timeout'.
So as you can see - 2 min delay before anything happens...
Hope something can be seen from this?
Tried attaching gdb but I already issued stop command and I didn't manage to find anything in 2 minutes because I don't really know how to use gdb correctly in this case (should I try with something else when it happens again?).
Many thanks for your great work and for your patience. It seems that, for some reason, Shairport Sync is hanging up in certain circumstances -- not correctly terminating a session.
As you'll have seen from recent comments in the development branch, some bugs have recently been fixed. The version you are using incorporates these fixes, so it would be very interesting for me to see if we could catch Shairport Sync [still] locking up. Of course, it may be unrelated.
So, if you could update to the latest development version, please, and set the log verbosity to 2, it would be a big help. The amount of output on log level 2 has been modified so that it will print RTSP messages coming from the sources (all except for one very common one), so we should be able to see the requests for a play session and for the termination of a session coming in, and get an idea what's happening. It shouldn't be too chatty. The idea is that if it happens again, you would have a history in the log of all the calls to set up and terminate the now-hung-up session and of the session that was refused because the hung-up session was already playing.
Using gdb
for this would be very difficult -- effectively impossible IMHO -- because it would require manual intervention from time to time.
As a matter of interest, what output device are you using please?
Hi,
Raspberry Pi B and B+ via mostly USB soundcard (hw1,0) and one RPi uses HifiBerry Digi+.
I have new report with development version:
May 29 10:01:03 lab shairport-sync[504]: Type: "CSeq", content: "2"
May 29 10:01:03 lab shairport-sync[504]: Type: "Server", content: "AirTunes/105.1"
May 29 10:01:03 lab shairport-sync[504]: Connection 7: RTSP thread terminated.
May 29 10:01:20 lab shairport-sync[504]: New RTSP connection from 192.168.0.60:58298 to self at 192.168.0.92:5000 on conversation thread 8.
May 29 10:01:20 lab shairport-sync[504]: RTSP thread 8 received an RTSP Packet of type "ANNOUNCE":
May 29 10:01:20 lab shairport-sync[504]: Type: "CSeq", content: "2"
May 29 10:01:20 lab shairport-sync[504]: Type: "User-Agent", content: "forked-daapd/26.0"
May 29 10:01:20 lab shairport-sync[504]: Type: "Client-Instance", content: "5AFCFBF5BBCE89AD"
May 29 10:01:20 lab shairport-sync[504]: Type: "DACP-ID", content: "5AFCFBF5BBCE89AD"
May 29 10:01:20 lab shairport-sync[504]: Type: "Content-Type", content: "application/sdp"
May 29 10:01:20 lab shairport-sync[504]: Type: "Apple-Challenge", content: "KXjVZVPPIWLWw5Ja841Rpw"
May 29 10:01:20 lab shairport-sync[504]: Type: "Content-Length", content: "570"
May 29 10:01:20 lab shairport-sync[504]: Connection 8: ANNOUNCE
May 29 10:01:20 lab shairport-sync[504]: RTSP Conversation thread 6 already playing when asked by thread 8.
May 29 10:01:20 lab shairport-sync[504]: ANNOUNCE failed to get the player
May 29 10:01:20 lab shairport-sync[504]: Already playing.
May 29 10:01:20 lab shairport-sync[504]: RTSP thread 8: RTSP Response:
May 29 10:01:20 lab shairport-sync[504]: Type: "Apple-Response", content: "26aZ0SBsS5PBTr1E75XX+cG6yScNT2pvI/G0CqGM5mmb6+/Pbx/iUvpcvPpKK0TBzVLn2L2+Ju72SGfVw4UAk13qx6k1K0JpIudUVAghOcjQmV6j680euJZTMGHH0f/mRBKV5cSmAgec6ll6DCJgo5azLrnJdeOeAFmh7jkt1KbpfDncftn1559jA3GrIDICpaJmIiMSgoaSrxhuZDyYAzUGfbW0f3SzOPRt1mkiZEOZkzUsxYVz9muW6Xg1SShv25vGHCCsoL9FYtK1pc5arryj6R5851qz8KaMrjkq38pr9Xx9kZtR0dw/2KoXyy4pH0G41eylAzJiYiZ0+xd5kw"
May 29 10:01:20 lab shairport-sync[504]: Type: "CSeq", content: "2"
May 29 10:01:20 lab shairport-sync[504]: Type: "Server", content: "AirTunes/105.1"
May 29 10:01:20 lab shairport-sync[504]: Connection 8: RTSP thread terminated.
So on one of the devices I cannot play again, It's loading the connection to speaker in mobile app (Remote), then fails... Again Already playing. ...
Thanks. Do you have the full "history" where thread 6 was initiated and where an attempt was make to tear it down please?
Fantastic, thanks. Let me have a look. This is great.
So, it looks like a TEARDOWN request for thread 6 was received by not fully acted upon. Let me think a bit further about this. This is really good stuff.
Nice, I'm happy the issue is found... today I'll dedicate some time for studying the source as well... I should probably get back to reading more C code. Then I'll try to understand how the fix is done.
Thanks. I'm almost certain that the problem is one of overlapping concurrent execution of critical sections of threads. It seems that forked-daapd can cause this (not it's fault), so I will play around with it can see if can induce it. I may push out an update that drops down to a greater level of detail for a TEARDOWN...
Hello again. I've just pushed an update which should give some more debug information during a TEARDOWN, so it might offer a little more precision on where the problem is arising. If you would update to it, I'd be most grateful. Then if the same scenario as you looked above happened, we could see how far the TEARDOWN got before it stalled... Meantime, I'd installing forked-daapd...
Great, I updated on all 5 players and will send a log when the problem happens again in a day or two...... thank you
New log:
https://gist.github.com/davidhq/1aecb28d409d2814bd862768fcb941a4
I noticed one of the players with the weakest wifi has the most issues... and the one on ethernet cable never had this problem occurring. Another one with a bit problematic wifi (when it works, it works, but sometimes the signal doesn't come through) also showed this problem.
Also you mentioned that forked-daapd can cause this, but as I mentioned also streaming from latest iOS as well... since I figured out this is not isolated to streaming from iOS, I'm only testing as I actually use it most of the time - via forked-daapd.
That's really good, thanks.
OK, here's from another instance where it just happened:
journalctl:
https://gist.github.com/davidhq/58d70fe3e5861bc292977de52da8ff0c
/var/log/syslog:
https://gist.github.com/davidhq/7cb29f906ad6ce73a7dc3ed668833b37
It's the same hang-up in all these cases. I might be coming back with an update to get a bit more forensic at that point.
Hello again. I am afraid I'm having difficulty figuring out what's happening (still), so I have pushed yet another update with more debug messages in the hope that I can zero-in on where the problem is. It's a "strong signal", since it seems to be rather readily reproducible, but I can't think what it is. Perhaps, whenever you get the opportunity, you would [yet again] update and see if you can capture another incident. BTW, if you could also turn on the "delta" timing, it might turn out to be useful -- do this by setting the configuration file's diagnostics
section's log_show_time_since_last_message
setting to "yes"
.
Here is what you'd get if a TEARDOWN went fully to plan:
10.915953597|RTSP thread 2 received an RTSP Packet of type "TEARDOWN":
0.001060988| Type: "CSeq", content: "10"
0.000735991| Type: "Session", content: "1"
0.000774991| Type: "User-Agent", content: "iTunes/12.7.5 (Macintosh; OS X 10.13.4) (dt:1)"
0.000723992| Type: "Client-Instance", content: "900B806C3A8FAA81"
0.000710992| Type: "DACP-ID", content: "900B806C3A8FAA81"
0.000479994| Type: "Active-Remote", content: "3634232794"
0.000212998|Connection 2: TEARDOWN
0.000488994|TEARDOWN: synchronously terminating the player thread of RTSP conversation thread 2 (2).
0.000746991|player_stop
0.000584993|player_thread_lock acquired
0.000671993|player_thread exists
0.000677992|player_thread signalled
0.000212997|buffer_get_frame exiting due to thread stop request.
0.000099999|Connection 2: player thread main loop exit.
0.000081999|Requesting output device to stop.
0.000069000|Cancelling timing, control and audio threads
0.002349972|Timing Receiver Cleanup.
0.003647958|Timing Receiver Cleanup Successful.
0.003739957|Control Receiver Cleanup.
0.000257997|Control Receiver Cleanup Successful.
0.002294973|Audio Receiver Cleanup.
0.001574982|Audio Receiver Cleanup Successful.
0.001836979|Joining terminated threads.
0.002498971|Timing thread terminated.
0.001032988|Control thread terminated.
0.000121999|Audio thread terminated.
0.000073999|Freeing audio buffers and decoders.
0.003806956|Connection 2: player thread terminated.
0.001969977|pend
0.001537982|TEARDOWN: successful termination of playing thread of RTSP conversation thread 2.
0.001614982|RTSP thread 2: RTSP Response:
0.000846990| Type: "CSeq", content: "10"
0.000199998| Type: "Server", content: "AirTunes/105.1"
0.000765991| Type: "Connection", content: "close"
2.016539702|Connection 2: RTSP thread terminated.
My slight guess is that it won't get past player_stop
, but it would be nice to confirm.
Ok, no problem. I'm in the mood for helping with this, so I can do as many updates as needed.. and as you say, the problem happens reliably each day so there is steady progress.
May 31 19:50:05 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7981.8, 1506.8, 26162.0.
May 31 19:50:25 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7981.3, 1834.1, 30734.0.
May 31 19:50:45 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7981.3, 1388.4, 22397.0.
May 31 19:51:03 lab shairport-sync[498]: RTSP thread 4 received an RTSP Packet of type "TEARDOWN":
May 31 19:51:03 lab shairport-sync[498]: Type: "CSeq", content: "67"
May 31 19:51:03 lab shairport-sync[498]: Type: "User-Agent", content: "forked-daapd/26.0"
May 31 19:51:03 lab shairport-sync[498]: Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 19:51:03 lab shairport-sync[498]: Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 19:51:03 lab shairport-sync[498]: Type: "Session", content: "1"
May 31 19:51:03 lab shairport-sync[498]: Connection 4: TEARDOWN
May 31 19:51:03 lab shairport-sync[498]: TEARDOWN: synchronously terminating the player thread of RTSP conversation thread 4 (2).
May 31 19:51:03 lab shairport-sync[498]: player_stop
May 31 19:51:03 lab shairport-sync[498]: player_thread_lock acquired
May 31 19:51:03 lab shairport-sync[498]: player_thread exists
May 31 19:51:03 lab shairport-sync[498]: player_thread signalled
May 31 20:17:01 lab CRON[2324]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 31 20:47:23 lab shairport-sync[498]: New RTSP connection from 192.168.0.10:59820 to self at 192.168.0.93:5000 on conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Successfully created RTSP receiver thread 5.
May 31 20:47:23 lab shairport-sync[498]: CSeq: 1.
May 31 20:47:23 lab shairport-sync[498]: User-Agent: forked-daapd/26.0.
May 31 20:47:23 lab shairport-sync[498]: Client-Instance: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]: DACP-ID: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5 received an RTSP Packet of type "OPTIONS":
May 31 20:47:23 lab shairport-sync[498]: Type: "CSeq", content: "1"
May 31 20:47:23 lab shairport-sync[498]: Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:47:23 lab shairport-sync[498]: Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]: Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]: Connection 5: OPTIONS
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5: RTSP Response:
May 31 20:47:23 lab shairport-sync[498]: Type: "CSeq", content: "1"
May 31 20:47:23 lab shairport-sync[498]: Type: "Server", content: "AirTunes/105.1"
May 31 20:47:23 lab shairport-sync[498]: Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"
May 31 20:47:23 lab shairport-sync[498]: CSeq: 2.
May 31 20:47:23 lab shairport-sync[498]: User-Agent: forked-daapd/26.0.
May 31 20:47:23 lab shairport-sync[498]: Client-Instance: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]: DACP-ID: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]: Content-Type: application/sdp.
May 31 20:47:23 lab shairport-sync[498]: Apple-Challenge: vPuBnEiuIi/F09+PMKwGNQ.
May 31 20:47:23 lab shairport-sync[498]: Content-Length: 570.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5 received an RTSP Packet of type "ANNOUNCE":
May 31 20:47:23 lab shairport-sync[498]: Type: "CSeq", content: "2"
May 31 20:47:23 lab shairport-sync[498]: Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:47:23 lab shairport-sync[498]: Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]: Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]: Type: "Content-Type", content: "application/sdp"
May 31 20:47:23 lab shairport-sync[498]: Type: "Apple-Challenge", content: "vPuBnEiuIi/F09+PMKwGNQ"
May 31 20:47:23 lab shairport-sync[498]: Type: "Content-Length", content: "570"
May 31 20:47:23 lab shairport-sync[498]: Connection 5: ANNOUNCE
May 31 20:47:23 lab shairport-sync[498]: RTSP Conversation thread 4 already playing when asked by thread 5.
May 31 20:47:23 lab shairport-sync[498]: ANNOUNCE failed to get the player
May 31 20:47:23 lab shairport-sync[498]: Already playing.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5: RTSP Response:
May 31 20:47:23 lab shairport-sync[498]: Type: "Apple-Response", content: "2MHjszAIxTCCpb/oPou3wK3eftTO09IRhRZfeww6yvtpOZXPZseWDCqoCUNsPximsfLoi5z8BJZGyfYHGZ8oEUvqgxSZQeZusEdOEFX9uXODXc2M1OjQR9/p3GmoSkQjyJGjbKZciGkP/4JtwfAEJt1+9Ga8FIoMGToEkYCFZUKq5xSs4Zzcxj+I3UGXSXue2natVnp2F/bWxyh6boBZjbeoxz+JinuckVBmqW2aSyhZpm4Tw2QuvsdaxQxRQL7HANpds3aUS1ad4JDm+go9y7f0Xx139MJbMjrY60X75GAm2lxs77TlGSfXtcHwPHdEyPg4ib6Ig+0q6Ui4o4PXyA"
May 31 20:47:23 lab shairport-sync[498]: Type: "CSeq", content: "2"
May 31 20:47:23 lab shairport-sync[498]: Type: "Server", content: "AirTunes/105.1"
May 31 20:47:23 lab shairport-sync[498]: RTSP conversation thread 5 -- connection closed.
May 31 20:47:23 lab shairport-sync[498]: Synchronously terminate playing thread of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: player_stop
May 31 20:47:23 lab shairport-sync[498]: player_thread_lock acquired
May 31 20:47:23 lab shairport-sync[498]: player thread of RTSP conversation 5 is already deleted.
May 31 20:47:23 lab shairport-sync[498]: Successful termination of playing thread of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Request termination of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Connection 5: RTSP thread terminated.
May 31 20:48:23 lab shairport-sync[498]: found RTSP connection thread 5 in a non-running state.
May 31 20:48:23 lab shairport-sync[498]: RTSP connection thread 5 deleted...
May 31 20:55:28 lab shairport-sync[498]: New RTSP connection from 192.168.0.10:38810 to self at 192.168.0.93:5000 on conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Successfully created RTSP receiver thread 6.
May 31 20:55:28 lab shairport-sync[498]: CSeq: 1.
May 31 20:55:28 lab shairport-sync[498]: User-Agent: forked-daapd/26.0.
May 31 20:55:28 lab shairport-sync[498]: Client-Instance: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]: DACP-ID: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6 received an RTSP Packet of type "OPTIONS":
May 31 20:55:28 lab shairport-sync[498]: Type: "CSeq", content: "1"
May 31 20:55:28 lab shairport-sync[498]: Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:55:28 lab shairport-sync[498]: Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]: Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]: Connection 6: OPTIONS
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6: RTSP Response:
May 31 20:55:28 lab shairport-sync[498]: Type: "CSeq", content: "1"
May 31 20:55:28 lab shairport-sync[498]: Type: "Server", content: "AirTunes/105.1"
May 31 20:55:28 lab shairport-sync[498]: Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"
May 31 20:55:28 lab shairport-sync[498]: CSeq: 2.
May 31 20:55:28 lab shairport-sync[498]: User-Agent: forked-daapd/26.0.
May 31 20:55:28 lab shairport-sync[498]: Client-Instance: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]: DACP-ID: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]: Content-Type: application/sdp.
May 31 20:55:28 lab shairport-sync[498]: Apple-Challenge: kSF8+vlBKk3ZcSrFo09maA.
May 31 20:55:28 lab shairport-sync[498]: Content-Length: 570.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6 received an RTSP Packet of type "ANNOUNCE":
May 31 20:55:28 lab shairport-sync[498]: Type: "CSeq", content: "2"
May 31 20:55:28 lab shairport-sync[498]: Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:55:28 lab shairport-sync[498]: Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]: Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]: Type: "Content-Type", content: "application/sdp"
May 31 20:55:28 lab shairport-sync[498]: Type: "Apple-Challenge", content: "kSF8+vlBKk3ZcSrFo09maA"
May 31 20:55:28 lab shairport-sync[498]: Type: "Content-Length", content: "570"
May 31 20:55:28 lab shairport-sync[498]: Connection 6: ANNOUNCE
May 31 20:55:28 lab shairport-sync[498]: RTSP Conversation thread 4 already playing when asked by thread 6.
May 31 20:55:28 lab shairport-sync[498]: ANNOUNCE failed to get the player
May 31 20:55:28 lab shairport-sync[498]: Already playing.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6: RTSP Response:
May 31 20:55:28 lab shairport-sync[498]: Type: "Apple-Response", content: "atyW1PGIXNYBwl+F1/FKenUibQ4nm0JZusbfMob9uZRG3l773KfbxeqFfISv8fKa0NTwqVqQg/GKFC+a8jnJnSplFffutHyG7znvDYZmlr1vcFeRkeQqpm3fTHwCQVQSyktcGchX/nypMsLKDCQTuQhPG6vJGLswwMX4Kfc8OyvoAoW+GJ6zmqDPjP5H+t+Qev/sgl7a4+Co2SajDh86uQWyqD5IsNV2JOKzbzLN1ArHl/EEOlNQU5DnQTOUTDnz+1fI8YZW3jaEQB7PZwbAdnMrbCD3R2QXLqvTXvbw74mBfVQ+dlliI9RYcGiD2O0spiQdHnyo4I4essjjmOBFJg"
May 31 20:55:28 lab shairport-sync[498]: Type: "CSeq", content: "2"
May 31 20:55:28 lab shairport-sync[498]: Type: "Server", content: "AirTunes/105.1"
May 31 20:55:28 lab shairport-sync[498]: RTSP conversation thread 6 -- connection closed.
May 31 20:55:28 lab shairport-sync[498]: Synchronously terminate playing thread of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: player_stop
May 31 20:55:28 lab shairport-sync[498]: player_thread_lock acquired
May 31 20:55:28 lab shairport-sync[498]: player thread of RTSP conversation 6 is already deleted.
May 31 20:55:28 lab shairport-sync[498]: Successful termination of playing thread of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Request termination of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Connection 6: RTSP thread terminated.
May 31 20:56:28 lab shairport-sync[498]: found RTSP connection thread 6 in a non-running state.
May 31 20:56:28 lab shairport-sync[498]: RTSP connection thread 6 deleted...
Thanks again for this -- the TEARDOWN process got further than I expected. Let me dig in to it for a while.
Okay, so maybe a little progress. I've added lots of diagnostics with the hope of finding out where the thread is blocked, but I've also made some changes.
My suspicion is that either the thread is blocking while trying to send resend requests for missing packets, or else it's blocking on a thread lock that some other thread already has. The resend requests theory is appealing because it would be more likely to occur on a poor network, which chimes with your experience.
To address the first problem, I've added a timeout on the sending of resend requests. One might expect that sending UDP packets shouldn't block, but there are occasional situations where it might occur, apparently. Unfortunately, it's unclear whether the timeout will work or not.
As far as the second problem is concerned, I've added code to time lock requests and to complain if they exceed them. One might expect occasional overruns due to the nature of scheduling on the OS, but not many.
I've also slightly rearranged the code for sending resend requests, moving it to a slightly less time-sensitive part of the code.
As before, I'd be very grateful if you would run this again, and let's see what happens.
So far I report that I think something additional is broken (maybe on the way to get better, but still), so now the sound glitches very obviously while playing, it happens every few minutes or so.
In logs I think this shows it:
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#7) resend request for packet 13161 in range 13141 to 13404. Looking back 243 packets.
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#7) resend request for packet 13162 in range 13141 to 13405. Looking back 243 packets.
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#6) resend request for packet 13198 in range 13185 to 13406. Looking back 208 packets.
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#7) resend request for packet 13217 in range 13196 to 13460. Looking back 243 packets.
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#7) resend request for packet 13218 in range 13197 to 13461. Looking back 243 packets.
Jun 1 21:29:52 ela shairport-sync[482]: Last-ditch (#7) resend request for packet 13219 in range 13200 to 13462. Looking back 243 packets.
Jun 1 21:29:57 ela shairport-sync[482]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7668.3, 7330.9, 40730.0.
Jun 1 21:30:17 ela shairport-sync[482]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds: 7980.8, 2182.8, 42637.0.
Jun 1 21:30:22 ela shairport-sync[482]: Last-ditch (#3) resend request for packet 16991 in range 16959 to 17094. Looking back 103 packets.
(lots and lots of Last-ditch
lines )
Maybe a correction / undo of the changes and I try only with additional diagnostics or should I try to reproduce with code as it is now?
Something else: maybe it would be good to see where in history problems started by bisecting? I'm almost sure the version from 8-12 months ago didn't have such issues. For bisect, automatic simulation of the experience would be needed... on real hardware as now, just muted. So that feedback is faster. Let me know if I should try to make such script and try to see if I can simulate play/pause, speaker select/deselect until the problem happens...
Thanks. This is interesting because those last-ditch messages only appear when (a) there are network problems and (b) when the debug log verbosity is 3. You would have set the verbosity to 2, but it would have changed to 3 when a TEARDOWN was started and should go back to 2 when the TEARDOWN completes successfully. Evidently it didn't, so the verbosity has remained at 3. So, could you work back through the log to that TEARDOWN -- it would be quite interesting? The generation of such a slew of last-ditch messages might, on its own, account for the glitches.
Log: https://gist.github.com/davidhq/c415fd88b4ba9104305736a211846a98
I don't think there is such TEARDOWN here because the problem didn't occur.. I wrote to you after noticing stuttering, but the problem we're hunting down didn't occur in that 30min of testing... for some reason verbosity got stuck at 3 as you say. TEARDOWN at the bottom of the log succeeded and happens after all these last-ditch messages.
Will then come back with more tomorrow after the problem reoccurs (if it does).
So now I tried to disable this speaker and it took a lot of time to do it, now I can't reenable it (in Remote app connected to forked-daapd), but nothing plays or moves in the log.
I think this log TEARDOWN is useful but it's a different thing (disabling the speaker) instead of trying to play and acquire thread.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#3) resend request for packet 38651 in range 38634 to 38760. Looking back 109 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38690 in range 38659 to 38762. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38691 in range 38659 to 38763. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38692 in range 38659 to 38764. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38693 in range 38659 to 38765. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38694 in range 38659 to 38766. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38695 in range 38659 to 38767. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: Last-ditch (#2) resend request for packet 38696 in range 38659 to 38768. Looking back 72 packets.
Jun 2 16:52:15 lab shairport-sync[492]: RTSP thread 5 received an RTSP Packet of type "TEARDOWN":
Jun 2 16:52:15 lab shairport-sync[492]: Type: "CSeq", content: "456"
Jun 2 16:52:15 lab shairport-sync[492]: Type: "User-Agent", content: "forked-daapd/26.0"
Jun 2 16:52:15 lab shairport-sync[492]: Type: "Client-Instance", content: "57F0E53515CB41F9"
Jun 2 16:52:15 lab shairport-sync[492]: Type: "DACP-ID", content: "57F0E53515CB41F9"
Jun 2 16:52:15 lab shairport-sync[492]: Type: "Session", content: "1"
Jun 2 16:52:15 lab shairport-sync[492]: Connection 5: TEARDOWN
Jun 2 16:52:15 lab shairport-sync[492]: TEARDOWN: synchronously terminating the player thread of RTSP conversation thread 5 (2).
Jun 2 16:52:15 lab shairport-sync[492]: player_stop
Jun 2 16:52:15 lab shairport-sync[492]: pthread_cond_timedwait returned error code 110.
Jun 2 16:52:15 lab shairport-sync[492]: buffer_get_frame is iterating
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_lock at "player.c:832".
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "player.c:842".
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_lock at "rtp.c:894".
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "rtp.c:901".
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "player.c:635".
Jun 2 16:52:15 lab shairport-sync[492]: player_thread_lock acquired
Jun 2 16:52:15 lab shairport-sync[492]: player_thread exists
Jun 2 16:52:15 lab shairport-sync[492]: player_thread signalled
Jun 2 16:52:15 lab shairport-sync[492]: buffer_get_frame is iterating
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_lock at "player.c:832".
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "player.c:842".
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38769 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_lock at "rtp.c:894".
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38770 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "rtp.c:901".
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38771 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: buffer_get_frame exiting due to thread stop request.
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "player.c:1240".
Jun 2 16:52:15 lab shairport-sync[492]: Connection 5: player thread main loop exit.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38772 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38773 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38774 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38802 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38803 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38804 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38805 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38806 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38807 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38808 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38809 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38810 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38811 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38812 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38813 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38814 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38815 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: DACP monitor successfully stopped
Jun 2 16:52:15 lab shairport-sync[492]: Connection 5: stopping output device.
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_lock at "audio_alsa.c:906".
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38816 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38817 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38818 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38819 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38820 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38821 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38604 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38855 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: debug_mutex_unlock at "audio_alsa.c:923".
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38642 because the player thread was locked.
Jun 2 16:52:15 lab shairport-sync[492]: Cancelling timing, control and audio threads...
Jun 2 16:52:15 lab shairport-sync[492]: player_put_packet discarded packet 38856 because the player thread was locked.Audio Receiver Cleanup.
Jun 2 16:52:15 lab shairport-sync[492]: Audio Receiver Cleanup Successful.
Fantastic stuff, thanks! It'll take a little time to digest it.
By the way, was there any more in the log after that?
No, there wasn't...
Thanks — that’s really unexpected!
Still flying blind here (I still don't know the probable cause of the hang), so I've changed the code slightly to serialise the cancellation and joining of the three threads -- timing, control and audio -- to see which one seems to hang. Also I've added a shutdown()
before each close()
of the sockets these threads are using, in the hope that shutting the sockets down before closing them while data is still coming in, will happen a bit more smoothly. At your convenience, I'd be grateful if you'd give the update a try. Many thanks in advance.
Thanks again for these. I'll have a close look.
Hello again. This is a real mystery to me, and I am continuing to try to figure out what might be happening. Unfortunately, I am going to have to put it on the back burner for a while – I have two weeks of intensive work ahead of me. Please be assured that I am continuing to think about how to address it, and again, I'd like to thank you for your help so far. It does seem to be associated with networking issues, but it should at least be able to survive any problems.
Actually, I just found a bug which might account for the problem. Fixed in the latest push. I'd be glad if you'd try it – I'd have a modest hope that it's "the one".
I think it works now... at least according to testing yesterday evening and today... usually the problem would already have appeared! Will continue with testing, if it doesn't happen today, it's almost certainly solved....
Related: after you added a lot of debugging info (esp. last-ditch messages), the sound is stuttering while it didn't before. I will test some more with logging level 2, then try with 0. After the problem is confirmed solved, can the extreme debugging be turned off for level 2? If logging isn't the cause of stuttering, can it be that while the locking issue is solved, something works differently now? Out of two speakers where music stutters, at most one should have a bit flaky connection, other is very close to router, so this probably indicates the extreme logging is the cause?
Will continue testing everything now, let you know tomorrow, thank you again.
Still just found something:
I went out of the house, when I came back, tried to turn off two speakers, it was spinning in remote app, but after a minute, it succeeded .. now it seems to work well again, except one speaker, nothing is playing, remote app shows it's enabled, in log nothing happens, as once before.. not sure if the same problem or there are two and you solved on, this is still something else.. log: https://gist.github.com/davidhq/dc0e36952120fcfd7c0f64c195b1cc9c .. so at 10:32, the speaker was disabled, now cannot enable and nothing gets added to the log.
Same log on other speaker where it spinned for some time, now working: https://gist.github.com/davidhq/1cfcb57db81192651d403d9b04be4675
On the other topic: sure, it's understandable to put this off for some time until things become more clear and/or other work clears..
If there is something more to learn from last report and I can do something else, let me know.
That’s very encouraging news, thanks — and again, very many thanks for your help. I do think that the stuttering — assuming it wasn’t a noticeable thing before — may be caused by all the debug messages, and if so, it will be easy to turn them off. I’ll have to check, but I don’t think I have radically altered the “architecture” of the run-time code.
Something else that might be interesting for you is to turn on statistics
— it’s a setting in the configuration file, either in the general
section or, more recently, in the diagnostics
section at the end of the file. Among other things, it displays the number of resend requests in an interval, and the number of packets received “late”, i.e. out of sequence and more stuff like that. It could be revealing. My suspicion is that the network has some kind of fault in it, which was exposing this bug in Shairport Sync. But it may also be the cause of those new issues. It would be great to see the stats.
Those error messages are a bit odd — I haven’t seen them before, e.g:
Jun 6 09:34:50 lab dnsmasq-dhcp[454]: no address range available for DHCP request via wlan0
Actually this dnsmasq-dhcp
is something to consider, one of the raspberries (lab) has it enabled, because it supports switching between AP and wifi client.. for use in the car etc. I wrote a script to switch back and forth. Was still going to implement disabling dnsmasq and hostapd completely while not working as AP, but everything seemed to work even with this. Hostapd is not referenced, but yes, dnsmasq was trying to assign ip addresses, but didn't succeed because of different subnets -- so I believe in this case shouldn't influence anything. Also: this is new since about 2 days ago, shairport problems are happening longer.. I will turn this off (other network settings are completely clean, nothing special)... but first I downgraded shairport-sync to 2.8.6 just to see how it behaves as it is. Would also maybe be nice to see... after this, I get back to development
, turn on statistics and observe that as well... now I just want to confirm that current state of the network is capable of uninterrupted streaming on some version of shairport.
Apprx. 1 munite restart via systemctl
Happened between 1.1.1-23-gd65b8e8 (i'm not sure) and 3.1.7 versions