mikebrady / shairport-sync

AirPlay and AirPlay 2 audio player
Other
7.27k stars 573 forks source link

Stuffing always audible #89

Closed joerg-krause closed 9 years ago

joerg-krause commented 9 years ago

When streaming from iTunes to my custom device I always hear small cracks when shairport-sync does basic stuffing. I changed several parameters but nothing helped. The cracks are also audible when I change the volume. You write in the README that

The number of frames inserted or deleted is so small as to be almost inaudible.

The cracks are very good to hear when playing a sinus tone.

How can I influence the number of frames inserted/deleted? Or do you have another idea what might be wrong here?

mikebrady commented 9 years ago

Thanks for all your work on this. Here is what I've found so far:

joerg-krause commented 9 years ago

I see! So my comment https://github.com/mikebrady/shairport-sync/issues/89#issuecomment-116212700 means nothing, as in previous versions the values for tolerance/drift and the max dac delay are hard-coded.

On a Raspberry Pi, it looks like a USB DAC consumes frames slower than it should. This is a very strange result, but if output goes to the built-in DAC, the effect seems to disappear. I'll do some more checking overnight and tomorrow. [Edit –got the direction of the trend wrong. The USB DAC seems to be slower than it should be, so the correction is to delete frames occasionally.]

This is exact the same behaviour on my device. Although I can play high-quality audio files (192 kHz. 24-bit) with aplay, the DAC seems to be to slow to consume the frames from shairport.

mikebrady commented 9 years ago

Well, it's hard to interpret. But I wouldn't have started on this track of trying to figure out exactly where the delays, drifts and inaccuracies are coming from if you hadn't started!

I'm still working on trying to make sense of this apparent data. I've pushed a few updates that generate timing and drift data – the latest just now, producing source drift in microseconds, corrections applied in microseconds, size of buffer in frames and source DAC drift in microseconds. [I think!]

If the observations hold up, then it does indeed seem that the DAC is slow to consume data from Shairport Sync. I wonder if it's slow to consume data from aplay? I wonder if it's sensitive to the nominal rate? So, lots of things to investigate!

joerg-krause commented 9 years ago

Yes, a lot of things to investigate!

I'll look now at how the ALSA - DAC interface is working precisely to see if there is a bottleneck somewhere.

With latest commit I do not see that the hardware buffer size reported by ALSA increases to it's limit while streaming. But I'm testing with a MacBook now, instead of an iPhone.

Edit: Okay, the buffer_size is slowly increasing. And I must confess, that I changed some Linux kernel parameters (eg, changed CONFIG_HZ to 100 Hz). I will check if this has any influence.

Edit 2: Forget the statement of the last edit about the buffer_size. It levels out between 7000 and 7500. (I watched the wrong parameter).

joerg-krause commented 9 years ago

I compared the latest commit on my ARM and my x64 device (both wifi) and logged both outputs


Data set

x64

https://www.dropbox.com/s/lrlm7sqqfvukd34/pc.txt?dl=0

ARM

https://www.dropbox.com/s/9du36xujm9tb1la/wifi.txt?dl=0

Both are set to a drift of 882 and resync to 0.

The x64 shows a continuously increasing clock_drift.The ARMs clock_drift is also increasing over time, but much more fluctuating. Really interesting is that the x64 device has no corrections, whereas the ARM device continuously delete frames.


[Edit]: In all graphs the x-axes should be better labeled num of timing packets instead of seconds.

Clock drift

x64

x64_device-clock_drift

ARM

arm_device-clock_drift


Session corrections

x64

x64_device-session_corrections

ARM

arm_device-session_corrections


Source drift

The source drift is for both devices almost identical:

x64

x64_device-source_drift

ARM

arm_device-source_drift


Current DAC delay

x64

x64_device-current_delay

ARM

arm_device-current_delay


Pretty interesting! For example for the clock drift, I guess for the x64 needs much less processing time for the timing packets then the ARM device which has slower hardware interfaces, slower CPU,

However, a lot of corrections are done on the ARM device. The max buffer_size is 16383, so the buffer is not totally filled.

joerg-krause commented 9 years ago

Some additional informations to the previous debug log.

The x64 does corrections (insertions) after some minutes. These are the values when the corrections starts:

55120   0   5779    71993   263
54016   317 5830    72231   263

This is due to the clock drift, I guess.

The ARM got an underrun error:

49391   -1324716    6760    200877  246
ALSA lib pcm.c:7905:(snd_pcm_recover) underrun occurred
Error -32 in delay(): Broken pipe. Delay reported is 0 frames.
49391   -1325986    4294967295  201116  185
Aliasing of buffer index -- reset.
50772   -1325986    0   201355  185
Player: packets out of sequence: expected: 8052, got: 8564, sync error: 0 frames.
Lost sync with source for 4 consecutive packets -- flushing and resyncing. Error: -6509.
50772   -1325986    6428    0   263

[Edit]: A delay of 4294967295 is actually an eror code (-1).

mikebrady commented 9 years ago

Great work. Let me consider them for a while.

joerg-krause commented 9 years ago

Yes, of course!

Another addition log for the x64 device after an underrun error:

389286  92857   5762    502936  263
389286  92857   5828    503095  118
ALSA lib pcm.c:7905:(snd_pcm_recover) underrun occurred
Error -32 in delay(): Broken pipe. Delay reported is 0 frames.
[snipped ~20 of these error messages]
389286  91746   14347   503413  262
389032  88730   14432   503652  263
390068  85827   14424   503890  263
390272  82426   14311   504129  263
390428  79501   14242   504367  263
390428  76666   13797   504606  263
390428  73990   13941   504844  263
389774  70907   13598   505083  263
390086  67959   13498   505321  263
391266  64965   13465   505560  263
391449  61678   13376   505798  263
391628  58707   13312   506037  263
391813  55918   12896   506275  263
391813  52970   12741   506514  263
391813  49977   12686   506752  263
392014  46643   12617   506991  263
392540  43537   12547   507229  263
392713  40045   12441   507468  263
392713  36848   12284   507707  263
392830  34195   11918   507945  263
392830  31337   11785   508184  263
392830  28253   11703   508422  263
393617  25396   11686   508661  263
393617  22494   11585   508899  263
393999  19138   11503   509138  263
394176  16394   11080   509376  263
394176  13401   10970   509615  263
394176  10521   10855   509853  263
394176  7437    10984   510092  254
394886  4058    10726   510330  263
395072  544 10625   510569  263
395072  -2743   10500   510807  263
395072  -5442   10325   511046  263
395602  -8480   9971    511284  263
395602  -11655  10020   511443  263
394501  -15147  9749    511682  263
396177  -18344  9728    511920  263
396177  -21678  9435    512159  263
396523  -24263  9566    512397  263
396523  -27460  9151    512636  249
396848  -30544  9025    512874  263
396848  -33424  8896    513113  263
396848  -36462  8962    513351  263
397433  -39795  8769    513590  263
397273  -42743  8667    513828  263
397800  -45850  8600    514067  263
397800  -48820  8358    514305  262
397800  -52290  8201    514544  263
398325  -55306  7912    514783  263
397055  -58117  7811    515021  263
397496  -61201  7724    515260  263
397496  -63219  7688    515498  262
398748  -63219  7762    515737  263
399249  -63786  7834    515975  263
399421  -63786  7516    516214  263
399421  -63786  7546    516452  263
398819  -63786  7574    516691  263
399970  -63786  7666    516929  263
400146  -63786  7712    517168  263
400291  -63786  7761    517406  263
400511  -63786  7471    517645  263
[...]

The session correction value is now sticked to -63786.

mikebrady commented 9 years ago

First, I have the sign of the source drift wrong because I was misinterpreting the data. Here is my current rationale:

I've just pushed an update with these changes made. Also, I've changed the buffer size the alsa back end uses when connected to a hardware mixer back to 6615. The statistics are beginning to make sense. I'm concerned about the ragged look of the clock drift on your ARM device. Could you check it again with the fixes? More later.

mikebrady commented 9 years ago

Here are a few graphs of a Mac Pro playing to a Raspberry Pi with an IQAudIO PiDAC. One is using iTunes and the other is using Airplay. For the first time, these graphs make sense to me:

macpro itunes to rpi and iqaudio pidac

macpro airplay to rpi and iqaudio pidac

The corrections are applied by Shairport Sync and agree pretty closely (to within about 2.5 ppm) with the calculated drifts. It would be interesting to figure out what that 2.5 ppm or so actually is...

I'll do some more tests tomorrow.

joerg-krause commented 9 years ago

Many thanks for sharing your results! How did you did the trend line?

This is how it looks on my ARM device: arm_device-debug

mikebrady commented 9 years ago

Thanks for this. I used the Numbers app on a Mac – it's one of its options. Your graph (I'm presuming it's from the latest build) is interesting: it should be that Clock Drift + Source Drift + Corrections ~ 0. The second thing is that the clock drift is extremely rough. I can't think of a mechanism for this to happen. One quick question – it it possible your device is big-endian?

joerg-krause commented 9 years ago

No, it's little endian.

joerg-krause commented 9 years ago

If I take a longer range (650 timing packets), the clock drift does not look so rough anymore. But I wonder where all the corrections are coming from...

mikebrady commented 9 years ago

The X axis on your latest graph – is it timing packets? If so, say it's 300 packets, approximately 900 seconds. Thus the correction is roughly 350 milliseconds in 900 seconds – 388 ppm, very close to the numbers you were quoting before, and a long way off the -16000/1200 = -13.3 ppm on my graph above. While the drift is rough, I still can't see how it can be causing the problem, so I think there must be a bug in Shairport Sync. I'm going to have to think of some way to "zero in" on it. Does it always take a dive off at around 22 samples?

joerg-krause commented 9 years ago

Yes, it is the number of timing packets. The clock drift does almost look the same as in your graph if I plot it after 650 timing packets (~1950 seconds).

I'm going to have to think of some way to "zero in" on it.

Many thanks! I'm available for all kind of tests :-)

Does it always take a dive off at around 22 samples?

Yes, it's about 22-23 samples per second.

joerg-krause commented 9 years ago

I've compared the rt-nt delta for the ARM and the x64 device while multi-room streaming from iTunes on a MacBook. Both rt (reference timestamp) and nt (current packet timestamp) are values set from the source.

In the plot the upper graph shows the ARM device, the middle graph shows the ARM device. The lower graph shows the delta between the reference timestamp used by the ARM and the x64 device rt_delta = rt(ARM) - rt(x64). It shows that the two devices use different reference timestamps from time to time. Why? Do they receive different sync packets???

Note that nt_delta = nt(ARM) - nt(x64) is always 0 (not plotted here). So both devices are receiving all audio packets.

plot_rt-nt

mikebrady commented 9 years ago

TBH I wouldn't expect them to have the same timestamps.

About the ARM – can you tell me anything about the DAC? Does it have a part number that you can detect? Does it have a separate crystal or resonator from the CPU?

joerg-krause commented 9 years ago

TBH I wouldn't expect them to have the same timestamps.

Can you explain why?

The DAC is a Wolfson WM8524 connected to a Freescale i.MX28 CPU via I2S. The DACs MCLK is clocked by the CPU.

mikebrady commented 9 years ago

Here are two more graphs, both from an iMac. The X axis is seconds, the Y axis is microseconds. The first to to a TP-Link 710N running OpenWrt into a Topping TP30, a pretty good amplifier with a DAC input. The drift figures are pretty small, as you might expect:

imac to topping tp30 via tplink openwrt numbers

The second is to a Raspberry Pi model B into a cheap "Manhattan" brand "3D Sound" USB DAC. It seems to be based on a C-Media CM 119 IC, which, interestingly, has its own 12 MHz crystal – in other words, it is not necessarily synchronised to the USB clock or anything like that. So what we might be seeing is high levels of drift due to cheaper, lower-precision parts? Hence my question earlier, BTW.

imac to rpi with fancy manhattan usb dac numbers

mikebrady commented 9 years ago

Nothing wrong with either of those parts :)

About the timestamps, I never thought about it too much, but I imagine that communication with each extra airplay device would be handled by separate threads, each going about its business by itself. I'd have been a bit surprised if they were the same, frankly.

mikebrady commented 9 years ago

Full of silly questions, sorry, but here's another: does your CPU go into a low-power mode or anything of that nature?

joerg-krause commented 9 years ago

Full of silly questions, sorry, but here's another: does your CPU go into a low-power mode or anything of that nature?

No, any low-power features are disabled in the Kernel config:

# CONFIG_PM is not set
joerg-krause commented 9 years ago

Yeah!!!

I have it! The divider of the SAIF clock is without software interaction not precise enough for synchronisation. A guy from the Freescale community have had the same problem. Fortunalty he shared his solution just today :-) https://community.freescale.com/thread/357691

I will post a graph after some tests later.

@mikebrady Many thanks for all your help and work! It helped me really a lot to understand the implementation and the synchronization part. It is indeed a lot of good work you've done!

I hope the work done by @mikebrady will help others to quickly identify audio problems because of bad clock synchronization!

[Edit: Added missing link to the solution.]

joerg-krause commented 9 years ago

One quick question: What drift value did you use for the last test https://github.com/mikebrady/shairport-sync/issues/89#issuecomment-118001415?

mikebrady commented 9 years ago

It was left as the default – 88 frames. Looking forward to seeing some graphs :)

mikebrady commented 9 years ago

By the way, thanks are due to you, @joerg-krause, for providing the setting, motivation and a good deal of help in understanding the synchronisation system more completely.

joerg-krause commented 9 years ago

I applied the patches and rerun the test two times. It looks much better now compared to my last test. With one exception:

clock-adjusted

After around 145 timing packets a lot of corrections are done. Why? This was the same in both tests.

mikebrady commented 9 years ago

That looks somewhat better alright. Can you plot the sum of Clock Drift + Source Drift + Corrections? That should tend to zero, but allowing for some slop due to the drift figure. I'm just wondering if the error begins to cross the threshold at around 142.

joerg-krause commented 9 years ago

I removed DAC delay in favour of Net (= Clock Drift + Source Drift + Corrections):

clock-adjusted-with-net

Just for clarification: all parameters are set to default (eg drift is 88 frames).

I'm just wondering if the error begins to cross the threshold at around 142

Which error?

joerg-krause commented 9 years ago

Ok, I enabled debug output for the Linux audio driver to see when the frequency is adjusted. In the current test I can see that around the frequency is adjusted (faster) and the number fo corrections rapidly is growing. After 52 seconds the frequency is adjusted to run slower (it is the same value as before the adjustment).

In the current running test the number of corrections before the adjustment was done was 0!

mikebrady commented 9 years ago

The "error" I meant was just the latency error calculation. But that would not account for the duration of the change in the rate of corrections – there seems to be some kind of hysteresis effect working there, but I don't know where it could be coming from. Trying to understand the patches for that clock problem on the Freescale i.MX28 CPU, is it possible that the dividing ratios are being changed from time to time? Also, that clock drift figure is still rough – it might be a further clue...

mikebrady commented 9 years ago

Ah – our posts crossed. So my guess might be right?

joerg-krause commented 9 years ago

Yes, it is. Every 420.1 seconds the frequency is adjusted for 52 seconds to maintain audio synchronization within +/-20ms of sample rate.

I will run a test now with setting the drift to 882 (= 20ms).

mikebrady commented 9 years ago

That looks like it then. The first 420 seconds is around 144 samples. The rapid change lasts to around sample 161, thus 15 samples or 45 seconds. It's plausible.

joerg-krause commented 9 years ago

This is the graph with MCLK frequency adjustment. Drift is set to 882:

frequency-adjustment

You can see clearly that every 420 seconds the frequency (the clock is slower then the desired clock) is adjusted for 52 seconds (clock runs faster then the desired clock).

What do you think about this?

[Edit]: The tests lasts about 3000 seconds.

mikebrady commented 9 years ago

That graph looks exactly as I would expect. The clock drift still looks rough. What does it sound like, I wonder? I think it should sound pretty okay, though you might hear bursts of crackle every 420 seconds...

joerg-krause commented 9 years ago

Sounds much better now! Far less crackles.

This is a graph a made when I measured the roundtrip time. Maybe that's why the clock drift looks so rough?

roundtrip_time

mikebrady commented 9 years ago

Yep. The filter should be discarding the outliers, but there is still plenty of variation below 10,000 (10 milliseconds?). Do you have any ideas about the round-trip time variations yourself?

joerg-krause commented 9 years ago

Yes, 10 ms.

Do you have any ideas about the round-trip time variations yourself?

Not sure about this. Maybe some wifi driver issues? Do you think it's worth to look at this?

mikebrady commented 9 years ago

Off the top of my head, yes, I do – something is going on there. I guess I should do a few tests myself here to compare it with, but my vague memory is that the times were in the low single milliseconds, often much lower.

joerg-krause commented 9 years ago

Okay, I'll have a look at this tomorrow. Many thanks so far!

mikebrady commented 9 years ago

Here's an example on a TP-Link 710N using its built-in WiFi here at home. Seems to be just over 2 ms on average, and pretty even.

turnaround time tp-link 710

joerg-krause commented 9 years ago

Thanks for the test! I run several tests om my own, but did not succeed in getting an average ping time under 5ms. I will discuss this on the linux-wireless mailing list, as it is not shairport specific.

Nevertheless, I will post any results here.

mikebrady commented 9 years ago

Thanks. We might be able to close this issue soon, I'm thinking.

joerg-krause commented 9 years ago

I close this thread until any new breakthrough discoveries are made.

Again, @mikebrady many thanks for all the work and investigations!

mikebrady commented 9 years ago

Thanks – it was very interesting.