voice-engine / ec

Echo Canceller, part of Voice Engine project
GNU General Public License v3.0
246 stars 70 forks source link

Help with utils #18

Closed StuartIanNaylor closed 4 years ago

StuartIanNaylor commented 4 years ago

Just some questions as in utils there is get_delay.py

which python get_delay.py recording.wav playback.wav

[69.0, 575.0]
[1036.0, 781.0]
[2828.0, 563.0]
[3307.0, 563.0]
[3034.0, 563.0]
[4001.0, 781.0]
[-1631.0, 563.0]
[2806.0, 563.0]
[3567.0, 781.0]
[-1935.0, 563.0]
[-2501.0, 781.0]
[-1876.0, 562.0]
[-658.0, 563.0]
[-1684.0, 563.0]
[1229.0, 562.0]
[2687.0, 562.0]
[-2690.0, 781.0]
[-939.0, 781.0]
[-7654.0, 781.0]
[-4018.0, 562.0]
[4426.0, 781.0]
[16.0, 0.0]
[1413.0, 0.0]
[1133.0, 563.0]
[4102.0, 562.0]
[1110.0, 781.0]
[-7729.0, 781.0]
[619.0, 563.0]
[1187.0, 562.0]
[2019.0, 563.0]
[-4089.0, 781.0]
[1265.0, 563.0]
[-610.0, 781.0]
[747.0, 563.0]
[-5139.0, 562.0]
[560.0, 0.0]
[574.0, 563.0]
[-1916.0, 781.0]
[4992.0, 562.0]
[5902.0, 563.0]
[1195.0, 563.0]
[-133.0, 562.0]
[-138.0, 562.0]
[2910.0, 781.0]
[-1650.0, 1492.0]
[-4477.0, 1492.0]
[-3806.0, 1492.0]
[-984.0, 1491.0]
[460.0, 1492.0]
[2537.0, 1492.0]
[4594.0, 1710.0]
[-2305.0, 1491.0]
[7149.0, 1492.0]
[1744.0, 1492.0]
[-180.0, 1491.0]
[2079.0, 1492.0]
[-1217.0, 1491.0]
[1790.0, 0.0]
[0.0, 0.0]
[-2432.0, -2432.0]

@xiongyihui what does the output mean?

On a silent bit I usually get [-8000.0, -8000.0] I just played back a sample wav in a relatively quiet room as guess out is not what we want here we are comparing played with recorded. So they are both relative +-16000 samples @16000hz. I guess maybe I should test with a solid frequency or something but struggling to work out actually what the results mean and even what or how to take an average. I am on a RockPiS A35 and just wondering what the delay would be

xiongyihui commented 4 years ago

It seems that your recording audio is not well correlated with the playback. I prefer to use random generated while noise to get the delay between recording and playback.

StuartIanNaylor commented 4 years ago

If I use alsabat I get

rock@rockpis:~$ alsabat -Phw:2,0 -Chw:2,0 -c2 --roundtriplatency alsa-utils version 1.1.8

Start round trip latency
Entering playback thread (ALSA).
Set period size: 45  buffer size: 90
Get period size: 45  buffer size: 90
Playing generated audio sine wave
Entering capture thread (ALSA).
Set period size: 45  buffer size: 90
Get period size: 45  buffer size: 90
Recording ...
Test1, round trip latency 12ms
Test2, round trip latency 12ms
Test3, round trip latency 11ms
Test4, round trip latency 12ms
Overrun: Broken pipe(-32)
Playback completed.
Capture canceled.

Start round trip latency
Entering playback thread (ALSA).
Set period size: 89  buffer size: 178
Get period size: 89  buffer size: 178
Playing generated audio sine wave
Entering capture thread (ALSA).
Set period size: 89  buffer size: 178
Get period size: 89  buffer size: 178
Recording ...
Test1, round trip latency 14ms
Test2, round trip latency 16ms
Test3, round trip latency 16ms
Underrun: Broken pipe(-32)
Overrun: Broken pipe(-32)
Playback completed.
Capture canceled.

Start round trip latency
Entering playback thread (ALSA).
Set period size: 133  buffer size: 266
Get period size: 133  buffer size: 266
Playing generated audio sine wave
Entering capture thread (ALSA).
Set period size: 133  buffer size: 266
Get period size: 133  buffer size: 266
Recording ...
Test1, round trip latency 19ms
Test2, round trip latency 19ms
Test3, round trip latency 21ms
Test4, round trip latency 19ms
Test5, round trip latency 19ms
Final round trip latency: 19ms
Playback completed.
Capture completed.

Return value is 0

Would you say same as Pi as with alsabat Pi3 gives about the same but doesn't test through ec.

But will try play -n synth whitenoise with sox

rock@rockpis:~/ec/util$ python get_delay.py recording.wav playback.wav
get_delay.py:35: RuntimeWarning: invalid value encountered in divide
  cc = np.fft.irfft(R / np.abs(R), n=(interp * n))
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[462.0, 465.0]
[474.0, 468.0]
[465.0, 464.0]
[465.0, 464.0]
[465.0, 468.0]
[465.0, 468.0]
[474.0, 464.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[462.0, 465.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 463.0]
[462.0, 465.0]

PS EC is working great in comparison to psulseaudio webrtc aec which seems to suck on arm

So its seems to be averaging about 462 is that one wav from a theoretical center and the other is offset the other way so its the total of the 2 converted in msec by using the sampling rate?

462+465 /16000 * 1000 = 58msec

? Apols as not sure if the delay is frames, samples or msec (Doh its frames as it says which are 10msec) Lols just ignore me but for delay what should I put :)

StuartIanNaylor commented 4 years ago

Yeah ignore my brainfarts :)

rock@rockpis:~/ec/util$ python get_delay.py recording.wav playback.wav
get_delay.py:35: RuntimeWarning: invalid value encountered in divide
  cc = np.fft.irfft(R / np.abs(R), n=(interp * n))
[-8000.0, -8000.0]
[-8000.0, -8000.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[810.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 812.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[813.0, 812.0]
[806.0, 809.0]
[806.0, 807.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 809.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[806.0, 807.0]
[810.0, 809.0]

As start ec without a delay with -s such as ./ec -i plughw:2 -o plughw:2 -s play -n synth whitenoise arecord -r16000 -fS16_LE -c1 test.wav

So what should the delay be?

@xiongyihui apols ignore the previous post.

Spending some time looking blank at code its the stereo pair and frames in sample rate. But a delay of 806 does not work well but its does say frames. The frame is 10msec so 160 samples and confused as for the Pi you recommend 200 but can not see how you managed to get that figure even though works well.

StuartIanNaylor commented 4 years ago

@xiongyihui Apols but eventually I did work it out.

the detect_delay is samples and the delay frames is mSec I also have a hunch that maybe it should be in the middle of the current frame so +5 mSec

StuartIanNaylor commented 1 year ago

@xiongyihui Apols as realy did confuse things as the delay is just plain old samples which the stereo pair can just paste the average of the 2 aka 806 as -d

Sorry about that and with reading again after a long time I should post for others