roc-streaming / roc-toolkit

Real-time audio streaming over the network.
https://roc-streaming.org
Mozilla Public License 2.0
1.03k stars 205 forks source link

Understand why reported niq latency is above target #548

Open gavv opened 10 months ago

gavv commented 10 months ago

Commands:

roc-send -vv -s rtp+rs8m://lo:10001 -r rs8m://lo:10002 -c rtcp://lo:10003 -i file:stash/long.wav
roc-recv -vv -s rtp+rs8m://lo:10001 -r rs8m://lo:10002 -c rtcp://lo:10003

In roc-recv output, you will see that niq_latency is growing to 210ms and then is slowly floating around that value. Expected behavior is that it would float around 200ms (which is target latency; you can change it with --sess-latency).

However, if I dump all inputs and outputs of FreqEstimator right after its invocations, I see more expected plot:

image

(yellow is target latency, blue is actual latency).

Here is full dump: fe.txt

Columns in dump:

gavv commented 10 months ago

Plotting script:

#!env python3
import sys
import pylab
import numpy

target_latency_arr = []
latency_arr = []
coeff_arr = []
trim_coeff_arr = []

with open('fe.txt') as fp:
    for f in fp.readlines():
        fld = list(map(float, f.split(' ')))
        target_latency_arr.append(fld[0])
        latency_arr.append(fld[1])
        coeff_arr.append(fld[2])
        trim_coeff_arr.append(fld[3])

latency_arr = numpy.array(latency_arr)

print((latency_arr < target_latency_arr[0]).sum())
print((latency_arr > target_latency_arr[0]).sum())

pylab.plot(latency_arr, '.-')
pylab.plot(target_latency_arr, '.-')
pylab.show()
gavv commented 10 months ago

Note: output from this lines:

print((latency_arr < target_latency_arr[0]).sum())
print((latency_arr > target_latency_arr[0]).sum())

is:

37680
38779
gavv commented 10 months ago

Well, that's funny,

Currently we're reporting logs exactly every 5 seconds.

I tried to change it to one report per every 777 frames and now I see expected values both below and above target latency!

So it seems like those 5 seconds were somehow in phase with latency oscillations. What is insane that it is reproducing very reliably.

@baranovmv Thoughts?

gavv commented 10 months ago

OK, so I was able to fix the problem, but I still don't fully understand it.

Log reports were done each 5 seconds, and they were clocked by CPU clock.

Latency calculations were done each 5 ms, and they were clocked by pipeline (pulseaudio) clock.

The clock drift between sound card and CPU, in turn, causes slow drift of the moment when we're making report (i.e. sampling current latency).

Somehow this drift causes biased sampling results (higher than expected). I still don't understand how exactly.

I reworked LatencyMonitor to clock log reports by pipeline clock instead of CPU clock, and the problem was fixed. Now sampling results are oscillating around target value no matter of sampling period.

0ebf4c46fd5195d9f3d8ec8ed20ad48bb9759bd1

gavv commented 10 months ago

Probably it is important that log reports were clocked by timer on same CPU that clocked sender (it was sending a file via localhost).

So basically we were sampling latency with the sender's clock.

gavv commented 9 months ago

More thoughts about this case.

So, sender adds packets to queue and temporary grows its size, based on its clock. PulseAudio removes packets from queue and temporary shrinks its size, based on another click. And reports are always triggered very soon after sender growed queue.

Then, remember that queue size that we saw in logs was above target by 10ms, which is 5 packets.

It could be explained if packets came in bursts. So that if report is always made soon after receiving a burst of packets (because of the coincidence described above), then the reported size will be biased proportionally to the burst size.

gavv commented 9 months ago

There are no visible bursts in packet receiving time.

(X is packet number, Y is recv time)

image

gavv commented 9 months ago

There are bursts in frame read time, triggered by PulseAudio.

X is frame number, Y is read time.

image

i.e., PulseAudio blocks us for a while, then unblocks and allows write 20 times in a row, then blocks for a while again. Each write is 1ms, i.e. each bust we feed it with 20ms of audio. Time between bursts is also 20ms.