Closed joerg-krause closed 9 years ago
Thanks for all your work on this. Here is what I've found so far:
I see! So my comment https://github.com/mikebrady/shairport-sync/issues/89#issuecomment-116212700 means nothing, as in previous versions the values for tolerance/drift
and the max dac delay are hard-coded.
On a Raspberry Pi, it looks like a USB DAC consumes frames slower than it should. This is a very strange result, but if output goes to the built-in DAC, the effect seems to disappear. I'll do some more checking overnight and tomorrow. [Edit –got the direction of the trend wrong. The USB DAC seems to be slower than it should be, so the correction is to delete frames occasionally.]
This is exact the same behaviour on my device. Although I can play high-quality audio files (192 kHz. 24-bit) with aplay, the DAC seems to be to slow to consume the frames from shairport.
Well, it's hard to interpret. But I wouldn't have started on this track of trying to figure out exactly where the delays, drifts and inaccuracies are coming from if you hadn't started!
I'm still working on trying to make sense of this apparent data. I've pushed a few updates that generate timing and drift data – the latest just now, producing source drift in microseconds, corrections applied in microseconds, size of buffer in frames and source DAC drift in microseconds. [I think!]
If the observations hold up, then it does indeed seem that the DAC is slow to consume data from Shairport Sync. I wonder if it's slow to consume data from aplay? I wonder if it's sensitive to the nominal rate? So, lots of things to investigate!
Yes, a lot of things to investigate!
I'll look now at how the ALSA - DAC interface is working precisely to see if there is a bottleneck somewhere.
With latest commit I do not see that the hardware buffer size reported by ALSA increases to it's limit while streaming. But I'm testing with a MacBook now, instead of an iPhone.
Edit: Okay, the buffer_size is slowly increasing. And I must confess, that I changed some Linux kernel parameters (eg, changed CONFIG_HZ to 100 Hz). I will check if this has any influence.
Edit 2: Forget the statement of the last edit about the buffer_size. It levels out between 7000 and 7500. (I watched the wrong parameter).
I compared the latest commit on my ARM and my x64 device (both wifi) and logged both outputs
https://www.dropbox.com/s/lrlm7sqqfvukd34/pc.txt?dl=0
https://www.dropbox.com/s/9du36xujm9tb1la/wifi.txt?dl=0
Both are set to a drift of 882 and resync to 0.
The x64 shows a continuously increasing clock_drift
.The ARMs clock_drift
is also increasing over time, but much more fluctuating. Really interesting is that the x64 device has no corrections, whereas the ARM device continuously delete frames.
[Edit]: In all graphs the x-axes should be better labeled num of timing packets instead of seconds.
The source drift is for both devices almost identical:
Pretty interesting! For example for the clock drift, I guess for the x64 needs much less processing time for the timing packets then the ARM device which has slower hardware interfaces, slower CPU,
However, a lot of corrections are done on the ARM device. The max buffer_size is 16383, so the buffer is not totally filled.
Some additional informations to the previous debug log.
The x64 does corrections (insertions) after some minutes. These are the values when the corrections starts:
55120 0 5779 71993 263
54016 317 5830 72231 263
This is due to the clock drift, I guess.
The ARM got an underrun error:
49391 -1324716 6760 200877 246
ALSA lib pcm.c:7905:(snd_pcm_recover) underrun occurred
Error -32 in delay(): Broken pipe. Delay reported is 0 frames.
49391 -1325986 4294967295 201116 185
Aliasing of buffer index -- reset.
50772 -1325986 0 201355 185
Player: packets out of sequence: expected: 8052, got: 8564, sync error: 0 frames.
Lost sync with source for 4 consecutive packets -- flushing and resyncing. Error: -6509.
50772 -1325986 6428 0 263
[Edit]: A delay of 4294967295 is actually an eror code (-1).
Great work. Let me consider them for a while.
Yes, of course!
Another addition log for the x64 device after an underrun error:
389286 92857 5762 502936 263
389286 92857 5828 503095 118
ALSA lib pcm.c:7905:(snd_pcm_recover) underrun occurred
Error -32 in delay(): Broken pipe. Delay reported is 0 frames.
[snipped ~20 of these error messages]
389286 91746 14347 503413 262
389032 88730 14432 503652 263
390068 85827 14424 503890 263
390272 82426 14311 504129 263
390428 79501 14242 504367 263
390428 76666 13797 504606 263
390428 73990 13941 504844 263
389774 70907 13598 505083 263
390086 67959 13498 505321 263
391266 64965 13465 505560 263
391449 61678 13376 505798 263
391628 58707 13312 506037 263
391813 55918 12896 506275 263
391813 52970 12741 506514 263
391813 49977 12686 506752 263
392014 46643 12617 506991 263
392540 43537 12547 507229 263
392713 40045 12441 507468 263
392713 36848 12284 507707 263
392830 34195 11918 507945 263
392830 31337 11785 508184 263
392830 28253 11703 508422 263
393617 25396 11686 508661 263
393617 22494 11585 508899 263
393999 19138 11503 509138 263
394176 16394 11080 509376 263
394176 13401 10970 509615 263
394176 10521 10855 509853 263
394176 7437 10984 510092 254
394886 4058 10726 510330 263
395072 544 10625 510569 263
395072 -2743 10500 510807 263
395072 -5442 10325 511046 263
395602 -8480 9971 511284 263
395602 -11655 10020 511443 263
394501 -15147 9749 511682 263
396177 -18344 9728 511920 263
396177 -21678 9435 512159 263
396523 -24263 9566 512397 263
396523 -27460 9151 512636 249
396848 -30544 9025 512874 263
396848 -33424 8896 513113 263
396848 -36462 8962 513351 263
397433 -39795 8769 513590 263
397273 -42743 8667 513828 263
397800 -45850 8600 514067 263
397800 -48820 8358 514305 262
397800 -52290 8201 514544 263
398325 -55306 7912 514783 263
397055 -58117 7811 515021 263
397496 -61201 7724 515260 263
397496 -63219 7688 515498 262
398748 -63219 7762 515737 263
399249 -63786 7834 515975 263
399421 -63786 7516 516214 263
399421 -63786 7546 516452 263
398819 -63786 7574 516691 263
399970 -63786 7666 516929 263
400146 -63786 7712 517168 263
400291 -63786 7761 517406 263
400511 -63786 7471 517645 263
[...]
The session correction value is now sticked to -63786.
First, I have the sign of the source drift wrong because I was misinterpreting the data. Here is my current rationale:
I've just pushed an update with these changes made. Also, I've changed the buffer size the alsa back end uses when connected to a hardware mixer back to 6615. The statistics are beginning to make sense. I'm concerned about the ragged look of the clock drift on your ARM device. Could you check it again with the fixes? More later.
Here are a few graphs of a Mac Pro playing to a Raspberry Pi with an IQAudIO PiDAC. One is using iTunes and the other is using Airplay. For the first time, these graphs make sense to me:
The corrections are applied by Shairport Sync and agree pretty closely (to within about 2.5 ppm) with the calculated drifts. It would be interesting to figure out what that 2.5 ppm or so actually is...
I'll do some more tests tomorrow.
Many thanks for sharing your results! How did you did the trend line?
This is how it looks on my ARM device:
Thanks for this. I used the Numbers app on a Mac – it's one of its options. Your graph (I'm presuming it's from the latest build) is interesting: it should be that Clock Drift + Source Drift + Corrections ~ 0. The second thing is that the clock drift is extremely rough. I can't think of a mechanism for this to happen. One quick question – it it possible your device is big-endian?
No, it's little endian.
If I take a longer range (650 timing packets), the clock drift does not look so rough anymore. But I wonder where all the corrections are coming from...
The X axis on your latest graph – is it timing packets? If so, say it's 300 packets, approximately 900 seconds. Thus the correction is roughly 350 milliseconds in 900 seconds – 388 ppm, very close to the numbers you were quoting before, and a long way off the -16000/1200 = -13.3 ppm on my graph above. While the drift is rough, I still can't see how it can be causing the problem, so I think there must be a bug in Shairport Sync. I'm going to have to think of some way to "zero in" on it. Does it always take a dive off at around 22 samples?
Yes, it is the number of timing packets. The clock drift does almost look the same as in your graph if I plot it after 650 timing packets (~1950 seconds).
I'm going to have to think of some way to "zero in" on it.
Many thanks! I'm available for all kind of tests :-)
Does it always take a dive off at around 22 samples?
Yes, it's about 22-23 samples per second.
I've compared the rt-nt
delta for the ARM and the x64 device while multi-room streaming from iTunes on a MacBook. Both rt
(reference timestamp) and nt
(current packet timestamp) are values set from the source.
In the plot the upper graph shows the ARM device, the middle graph shows the ARM device. The lower graph shows the delta between the reference timestamp used by the ARM and the x64 device rt_delta = rt(ARM) - rt(x64)
. It shows that the two devices use different reference timestamps from time to time. Why? Do they receive different sync packets???
Note that nt_delta = nt(ARM) - nt(x64)
is always 0 (not plotted here). So both devices are receiving all audio packets.
TBH I wouldn't expect them to have the same timestamps.
About the ARM – can you tell me anything about the DAC? Does it have a part number that you can detect? Does it have a separate crystal or resonator from the CPU?
TBH I wouldn't expect them to have the same timestamps.
Can you explain why?
The DAC is a Wolfson WM8524 connected to a Freescale i.MX28 CPU via I2S. The DACs MCLK is clocked by the CPU.
Here are two more graphs, both from an iMac. The X axis is seconds, the Y axis is microseconds. The first to to a TP-Link 710N running OpenWrt into a Topping TP30, a pretty good amplifier with a DAC input. The drift figures are pretty small, as you might expect:
The second is to a Raspberry Pi model B into a cheap "Manhattan" brand "3D Sound" USB DAC. It seems to be based on a C-Media CM 119 IC, which, interestingly, has its own 12 MHz crystal – in other words, it is not necessarily synchronised to the USB clock or anything like that. So what we might be seeing is high levels of drift due to cheaper, lower-precision parts? Hence my question earlier, BTW.
Nothing wrong with either of those parts :)
About the timestamps, I never thought about it too much, but I imagine that communication with each extra airplay device would be handled by separate threads, each going about its business by itself. I'd have been a bit surprised if they were the same, frankly.
Full of silly questions, sorry, but here's another: does your CPU go into a low-power mode or anything of that nature?
Full of silly questions, sorry, but here's another: does your CPU go into a low-power mode or anything of that nature?
No, any low-power features are disabled in the Kernel config:
# CONFIG_PM is not set
Yeah!!!
I have it! The divider of the SAIF clock is without software interaction not precise enough for synchronisation. A guy from the Freescale community have had the same problem. Fortunalty he shared his solution just today :-) https://community.freescale.com/thread/357691
I will post a graph after some tests later.
@mikebrady Many thanks for all your help and work! It helped me really a lot to understand the implementation and the synchronization part. It is indeed a lot of good work you've done!
I hope the work done by @mikebrady will help others to quickly identify audio problems because of bad clock synchronization!
[Edit: Added missing link to the solution.]
One quick question: What drift value did you use for the last test https://github.com/mikebrady/shairport-sync/issues/89#issuecomment-118001415?
It was left as the default – 88 frames. Looking forward to seeing some graphs :)
By the way, thanks are due to you, @joerg-krause, for providing the setting, motivation and a good deal of help in understanding the synchronisation system more completely.
I applied the patches and rerun the test two times. It looks much better now compared to my last test. With one exception:
After around 145 timing packets a lot of corrections are done. Why? This was the same in both tests.
That looks somewhat better alright. Can you plot the sum of Clock Drift + Source Drift + Corrections? That should tend to zero, but allowing for some slop due to the drift figure. I'm just wondering if the error begins to cross the threshold at around 142.
I removed DAC delay in favour of Net (= Clock Drift + Source Drift + Corrections):
Just for clarification: all parameters are set to default (eg drift is 88 frames).
I'm just wondering if the error begins to cross the threshold at around 142
Which error?
Ok, I enabled debug output for the Linux audio driver to see when the frequency is adjusted. In the current test I can see that around the frequency is adjusted (faster) and the number fo corrections rapidly is growing. After 52 seconds the frequency is adjusted to run slower (it is the same value as before the adjustment).
In the current running test the number of corrections before the adjustment was done was 0!
The "error" I meant was just the latency error calculation. But that would not account for the duration of the change in the rate of corrections – there seems to be some kind of hysteresis effect working there, but I don't know where it could be coming from. Trying to understand the patches for that clock problem on the Freescale i.MX28 CPU, is it possible that the dividing ratios are being changed from time to time? Also, that clock drift figure is still rough – it might be a further clue...
Ah – our posts crossed. So my guess might be right?
Yes, it is. Every 420.1 seconds the frequency is adjusted for 52 seconds to maintain audio synchronization within +/-20ms of sample rate.
I will run a test now with setting the drift to 882 (= 20ms).
That looks like it then. The first 420 seconds is around 144 samples. The rapid change lasts to around sample 161, thus 15 samples or 45 seconds. It's plausible.
This is the graph with MCLK frequency adjustment. Drift is set to 882:
You can see clearly that every 420 seconds the frequency (the clock is slower then the desired clock) is adjusted for 52 seconds (clock runs faster then the desired clock).
What do you think about this?
[Edit]: The tests lasts about 3000 seconds.
That graph looks exactly as I would expect. The clock drift still looks rough. What does it sound like, I wonder? I think it should sound pretty okay, though you might hear bursts of crackle every 420 seconds...
Sounds much better now! Far less crackles.
This is a graph a made when I measured the roundtrip time. Maybe that's why the clock drift looks so rough?
Yep. The filter should be discarding the outliers, but there is still plenty of variation below 10,000 (10 milliseconds?). Do you have any ideas about the round-trip time variations yourself?
Yes, 10 ms.
Do you have any ideas about the round-trip time variations yourself?
Not sure about this. Maybe some wifi driver issues? Do you think it's worth to look at this?
Off the top of my head, yes, I do – something is going on there. I guess I should do a few tests myself here to compare it with, but my vague memory is that the times were in the low single milliseconds, often much lower.
Okay, I'll have a look at this tomorrow. Many thanks so far!
Here's an example on a TP-Link 710N using its built-in WiFi here at home. Seems to be just over 2 ms on average, and pretty even.
Thanks for the test! I run several tests om my own, but did not succeed in getting an average ping time under 5ms. I will discuss this on the linux-wireless mailing list, as it is not shairport specific.
Nevertheless, I will post any results here.
Thanks. We might be able to close this issue soon, I'm thinking.
I close this thread until any new breakthrough discoveries are made.
Again, @mikebrady many thanks for all the work and investigations!
Thanks – it was very interesting.
When streaming from iTunes to my custom device I always hear small cracks when shairport-sync does basic stuffing. I changed several parameters but nothing helped. The cracks are also audible when I change the volume. You write in the README that
The cracks are very good to hear when playing a sinus tone.
How can I influence the number of frames inserted/deleted? Or do you have another idea what might be wrong here?