Closed gibman closed 5 years ago
Thanks for the interesting conundrum. It seems to kick off with the
Error writing an RTSP packet -- requested bytes not fully written.
message. I don't know what the cause of the problem is, but it's compounded by the fact that Shairport Sync will not abandon the now-defective session. Let me look into that.
Hi there. I've just pushed a development
branch update which should close a session in the above circumstances. That should give the opportunity for a new connection to be established. I'd be obliged if you'd give it a try.
thanks!..
Ill try it out asap and then let a few days go by before concluding anything. Are you interested in the logs regardless ?
Yes please; if it works, I'd expect it to log the fact that it dropped a session when that problem occurs. If it doesn't work, it wold be interesting to see what's happening.
Looking at the log a bit more closely, I'm not sure the fix will work. So, just now, I've pushed another update with some more diagnostics in it. Hopefully it will give some indications as to what's causing Shairport Sync to hang when this error happens.
no problem. I just uploaded the new binary (with diags) on all 5 pi's :=)
lets wait and see what happens...
Ok.. all 5 pi's exhibit same issue already.
here's some info from moode audio: A I R P L A Y S E T T I N G S
Version = 3.3d21
Friendly name = Kitchen
ALSA device = hw:0
Volume mixer = software
Resume MPD = No
Output bit depth = S16
Output sample rate = 44100
Session interruption = yes
Session timeout (ms) = 120
Audio buffer (secs) = 0.0
I hope the log is sufficient.
Thanks. I’m looking at the log on an iPhone and the first part of it, up to “Connection 184: new connection from 192.168.1.127:50127” is garbled. Is it me or is it the log file?
aghh.. it looks like I haven't enabled debug level logging.
hang on...
this is my how shairport-sync is started.
root 20639 1.0 0.7 8240 3232 ? S 19:19 0:00 sudo /usr/local/bin/shairport-sync -vv -a Kitchen -S soxr -w -B /var/local/www/commandw/spspre.sh -E /var/local/www/commandw/spspost.sh -- -d hw:0
root 20647 10.8 1.6 81380 7436 ? Sl 19:19 0:00 /usr/local/bin/shairport-sync -vv -a Kitchen -S soxr -w -B /var/local/www/commandw/spspre.sh -E /var/local/www/commandw/spspost.sh -- -d hw:0
so is the '-vv' parameter enough to get the wanted logging enabled ? this is what the logs attached on the previous reply is based upon.
waiting for your feedback.
Thanks. Yeah, that’s perfect. The problem was that there seemed to be a great deal of garbage characters in the start of the log. Maybe it’s the phone — I’ll get a chance to use a laptop later on.
yeah its garbled at my end also.
Ive included another log from my livingroom pi. seems to be the same issue regarding the garbled log.
I eyeballed the file manually over ssh/putty: more /var/log/shairport-sync.log
the more command just display blanks instead of garbled stuff.
the file was retrieved using winscp.
ok .. I deleted the log on the pi and restarted the process.
now I think we have a complete log to look at.
Thanks for your efforts. I can read that last one alright.
Okay, I’ll have to dig into this — I think you’ve given me enough information to try it here. I’ll keep you updated on progress. Meantime, I wonder what would happen if you disabled allowing a session to be interrupted?
I'm getting it to misbehave alright, but haven't nailed it down yet...
I've noticed a few small issues and fixed them on a development version I've just pushed. TBH, I don't think they have fixed the problem, but no harm to try them out -- I haven't gotten a crash since.
I was in touch with the [very helpful] developer of TuneBlade about a new ("POST") message I noticed coming from TB. He replied with this:
Apologies for the POST message. I added this recently for some receivers which required it. Other receivers I tested simply ignored this request, so I released it. Later, I got reports from users complaining about breaking compatibility with some devices. So, I've added a check to make sure I send the request only to compatible receivers. The updated version can be found here:
http://files.breakfreeaudio.com/downloads/TuneBlade_Installer_1_7_9_0.exe
Feel free to share this link with anyone. I was hoping to add more features before making a release, but I think I'm going to release it without any further delay.
TBH, I don't think the extra message is causing the problem, but again, it can't hurt...
cool.. thanks a lot.
I'll put on the new version of TB as well as a fresh compiled version of shairport-sync asap.
You will hear from me ...
Here's the latest log showcasing the same issue. Running on latest TB as well as the latest SP code.
Thanks. But it seems to have those garbage characters at the start again.
BTW, I've just pushed a small improvement to SPS, which just might be important. It imposes a timeout on SPS writing status information back to the source. This should remove one possible cause of hanging up.
Very nice.
I will try it out asap.
Keep you posted.
Regarding the garbled log.
Seems to appear after some time. Maybe if the log gets above a certain threshold?
Yes, I don't really know, TBH. The thing is, whatever happened, it seems to have happened up there in the garbled part.
I'm actually a little intrigued by the garbling of the log and I wonder if it could be connected in any way with the problem. A scenario that might account for all SPS devices having the same problem might be if they are all affected by some common flaw. So, is it possible that they all have the same brand of power supply, or are using the same brand of SD card and that it might be problematic? A bit of a long shot, TBH.
They all have different brands of 8 GB sd cards. They also have different types of USB chargers and cables.
A few weeks ago I installed a vanilla image of raspian stretch on one of the pi's and hooked it up to the telly.
I seem to remember it coming up with textual warnings about undervolting. More discussion here: https://www.raspberrypi.org/forums/viewtopic.php?t=210914
Remember, I have had the pi's working on raspian Jessie + an older version of shairport-sync. It could run for days. But once in a while the issue could occur.
The issue is occurring within an hour when running the latest shairport-sync.
regards gibman
fresh logs, issue just happened.
and logs for another device (bar)
edit: looking at the logs in L:374 we have the connection 2 teardown being initiated by tuneblade. but it looks like the thread never completes as we have :
Connection 2: TEARDOWN player_thread cancel... player_thread join... Connection 3: new connection from 192.168.1.127:2150 to self at 192.168.1.118:5000.
Looking elsewhere in the log for successful tear downs, and they look like this:
player_thread cancel... player_thread join... player_thread joined. pend
What seems to be missing in the connection 2 TEARDOWN : player_thread joined. pend
could it be as simple as a deadlock, preventing further connections ?
Or am I on a wild goosechase here?
Thanks. Your analysis is correct, I believe. The question is where the deadlock might be. I’m trying to replicate it here. From the information you gave earlier, it doesn’t seem as if there is some other cause that might be causing the problem. I’ll keep digging!
Something that would be useful is to enable delta timing in the logs. For some time, Shairport Sync has been able to show the time an event is logged and/or the time since the last log entry. The settings are in the diagnostics
past of the configuration file (you can see an example in /etc/shairport-sync.conf.sample
). If you could set log_show_time_since_last_message = "yes”
(don’t forget to uncomment it!) that’s would be very useful.
done :)
Ive also bumped verbosity to level 3
It seems like tuneblade is able to establish the first connection. this happens when I play some audio related on the host running TB. Once I pause the audio on the PC then TB kicks into standby mode. If I try and have TB connect again, simply by starting the same audio then we have the problem.
Now looking at the log I attached, it seems deadlocked at the last line. The log will not grow beyond this point even though TB tries and tries to connect.
I would think that under normal circumstances the log would be filled with these connection tries.
I even tried to restart TB on the windows PC, same stuff. Log is stuck. so it seems deadlocked for sure here.
did you try and install TB and reproduce ?
I guess one could look at a wireshark dump as well, if this helps ?
Thanks again, that's great -- the timings are useful. Indeed I have been using TB here, but I'll try the scenario you describe. No need for Wireshark just yet, thanks!
An interesting thing that I'm trying to think through is that there's a pretty fast sequence here:
0.000156999|Connection 1: RECORD
1.309847869|pbeg
0.000983993|RTSP thread 1: RTSP Response:
0.000211998| Type: "CSeq", content: "4"
0.000141999| Type: "Server", content: "AirTunes/105.1"
0.000141999| Type: "Audio-Latency", content: "11025"
0.004631964|RTSP thread 1 received an RTSP Packet of type "SET_PARAMETER":
0.000260998| Type: "CSeq", content: "5"
0.000143999| Type: "Session", content: "1"
0.000260998| Type: "Content-Type", content: "text/parameters"
0.000503996| Type: "Content-Length", content: "19"
0.000146999| Type: "User-Agent", content: "TuneBlade/1.0 (Windows)"
0.000143999| Type: "Client-Instance", content: "56B6CB929BB20486"
0.000140998| Type: "DACP-ID", content: "13EE26B2E036204E"
0.000137999| Type: "Active-Remote", content: "1983426473"
0.000827994|RTSP thread 1: RTSP Response:
0.000208998| Type: "CSeq", content: "5"
0.000141999| Type: "Server", content: "AirTunes/105.1"
0.003446974|RTSP thread 1 received an RTSP Packet of type "TEARDOWN":
0.000258998| Type: "CSeq", content: "6"
0.000151998| Type: "Session", content: "1"
0.000372998| Type: "User-Agent", content: "TuneBlade/1.0 (Windows)"
0.000167998| Type: "Client-Instance", content: "56B6CB929BB20486"
0.000144999| Type: "DACP-ID", content: "13EE26B2E036204E"
0.000139999| Type: "Active-Remote", content: "1983426473"
0.000158999|Connection 1: TEARDOWN
0.000162999|player_thread cancel...
0.000399996|player_thread join...
A RECORD
message comes in, basically the start of a session. It takes 1.3 seconds for the external command to complete, and then a flurry of messages -- probably setting the volume -- are followed by a TEARDOWN
, all in a few milliseconds. I'll have to think a bit about that...
When a Pi hangs up again, could you run top
or htop
to see if any core/cpu is stuck at 100% please?
I'll do asap.
one thing Ive noticed is that killing the process by: pkill -f shairport-sync
does not infact kill it. it still remains.
doing a sudo reboot also takes forever.
Might be because the process is hanging ?
btw .the TB issue is even more simple to repro. than the one I originally described.
1) start TB 2) wait until it says connection - standby 3) close TB 4) start TB
now it says connecting .. it does not come any further.
nothing to see regarding cpu usage. it only spikes during connection it seems.
Thanks for those posts. I have noticed how hard it is to kill it. Once I know its Process ID I can use:
kill -9 <pid>
and it normally works.
I can't reproduce the problem on my rig, so let me check your settings, please.
yes I have..
start at login autoconnect to receivers when discovered force reconnection on connection error
also in the airplay receivers section I have the checkbox 'autoconnect' enabled for each pi.
I tried to disable those above settings and then manually within TB hit the playback/stop button in turn. Effectively this disconnects and reconnects on demand. Ive tried now 20times or so manically turning the connection on off and it does not produce the error.
I even tried and restart TB with these new settings, and the issue does not seem to appear. Also tried to forcefully kill TB process and then restarting TB, still no sign of the issue.
food for thought huh ?
Is it time for wireshark ? I guess this needs to be installed on the TB host, as the traffic is not multicast ? I cant capture it on some random laptop here.
I'll let the pi's + TB set for a while with these options off and then see how it goes. Not having those options would have the sideeffect of me having to babysit the TB control panel and then manually connecting the PI's that have caused some hickup.
Thanks again for all your efforts. At my end, I've found another error and pushed a fix to it. Basically, if SPS was flushing a sequence of audio packets and found a spurious "unready" packet in the sequence while doing so (possibly caused by a missing packet), it would get into an infinite loop that it would not cancel out of. You'd just never know if that condition is occurring in your situation, so I'm not too hopeful. But, it's good housekeeping to fix it. Whenever you got a chance to update to it, it would be great.
Thanks. This is my fault for not updating the version number, but, are you certain the log is from the latest version? If it is, it’s interesting!
Actually, I can see that it is the latest version, thanks.
Here's hopefully a more interesting log with log level set to 3.
This one looks suspicious: Nov 19 15:43:54 kitchen shairport-sync[20934]: 0.000133998|Join audio thread.
It seems stuck here, unable to release the audio thread.
The log above contains a lot of initial successful connect-disconnect sessions. Maybe about 20 good ones before one bad appeared.
/gibman
Great, thanks, I'll have a look. I'm getting ready to push something with a bit more diagnostics shortly. No big developments at this end, though.
Very interesting log thanks. BTW, the new code yesterday for handling "unready" frames is being used!
hi.
I'm running shairport-sync 3.2.1 from the moode audio project.
I have 5 pi's running and they all exhibit the same misbehaviour.
I also have a windows PC running tuneblade 1.7.8.0
after a few hours it seems tuneblade is unable to establish a connection with around 80% of the pi's. tuneblade automatically retries the connection over and over again as can be seen in it's UI.
It works again for a few hours if I restart the shairport-sync process.
Maybe it's previous broken tuneblade connection that is still linkering on. tuneblade does crash from time to time and it could leave shairport-sync in a bad state maybe ? this is only me guessing here :)
logs attached from one of the faulting pi's. shairport-sync.log