Is it possible to automatically adjust buffer time and skew value when using webrtc aec?

hillxrem commented 5 months ago

I am adjusting the values of the webrtc aec in settings, but the values of buffer time and skew that can completely eliminate local echo seem to be different for different destination for some reason. For example, when I call a mobile phone from tSIP, I can completely eliminate the echo with buffer time 220, but not the echo of a landline. Conversely, when I call a landline from tSIP, I can completely eliminate the echo with buffer time 260, but not the echo of the mobile phone.

I would like to know the meanings and the way to adjust these values(buffer time , skew), and would it be difficult to implement a function that compares the audio input and output and automatically adjusts these values?

tomek-o commented 5 months ago

To be on the same page:

"local echo" means acoustic coupling between speaker and microphone used by the softphone (echo generated locally)
local echo is the echo heard by other party
"far echo" is the echo generated by other party, heard by softphone; tSIP cannot do anything about it and AFAIK almost no one does (maybe except some echo suppressors, but this is a double-edged sword, making call almost semi-duplex); maybe this would change (changed already?) with increasing complexity of audio processing like AI usage in audio codecs

Skew parameter is supposed to compensate clock difference between two audio devices (different real sampling rates). I don't know how important this is in a real life, but I've included it as it was exposed by WebRTC and some other implementations.

Second parameter, "ms in sound card buffer" is supposed to be estimation (more precisely: it should be little lower than real value) of total delay of softphone audio API, OS and sound card, sum of both directions. For portaudio module these delays are partially configurable (100 ms in each direction by default), for winwave/winwave2 I'm expecting about 60 ms in each direction (hence 120 ms as default delay for AEC).

The values you are using (220 ... 260 ms) seem oddly high to me. This should also not depend on what number is calling/called, that's why I'm wondering if you really mean local echo.

From my testing the biggest problem for local echo cancellation is audio processing / audio enhancements done by some audio cards. Somewhere in the control panel there might be options to disable it.

For testing AEC I might recommend enabling recording in stereo, calling either sip:3333@sip2sip.info or sip:music@iptel.org and then opening recording file with Audacity.

hillxrem commented 5 months ago

Thank you for explaining the meaning of the two values and of "local echo". The meaning of the "local echo" is as shown in the figure below, right?

In my office , it is not allowed to connect my laptop to the internet, so I tested like this : First I make a call from tSIP. Next, go to the soundproof room next door and pick up called phone(mobile phone or landline phone). Finally, I say "Hello" out loud and test if I get an echo back.

After my last post, I have done some tests. The results are as follows and so strange. Used module is winwave. usb hanset A : https://www.amazon.com/ALTEAM-Portable-Softphone-Lightweight-Microphone/dp/B07PRQRWXZ usb hanset B : https://www.amazon.com/Digitus-DA-70772-USB-Telephone-Handset/dp/B000FIH4FQ usb speakerphone(AEC built in) : https://www.amazon.com/Bluetooth-Speakerphone-Microphone-Reduction-Algorithm/dp/B08DNTXYCT/

As you say, the set value for canceling local echo should not vary depending on the destination of the call. I think something is wrong, but I have no idea what is going on.

PS Zoiper cancelled all handset echoes, but I'd like to use sip client whitch can be fully scripted.

tomek-o commented 5 months ago

Thank you for explaining the meaning of the two values and of "local echo". The meaning of the "local echo" is as shown in the figure below, right?

You are correct.

In my office , it is not allowed to connect my laptop to the internet, so I tested like this : First I make a call from tSIP. Next, go to the soundproof room next door and pick up called phone(mobile phone or landline phone). Finally, I say "Hello" out loud and test if I get an echo back.

This sounds like a lot of work. I might suggest other setup:

record some voice (using Audacity) as a mono wav file, optionally loop it few times in Audacity
prepare another tSIP instance, selecting wav file as audio input and headphones as output
either register second tSIP to the same PABX to use direct IP calling to call the first (tested) softphone (on the called softphone set binding to port, e.g. 0.0.0.0:5080, on the caller softphone call sip:IP_ADDRESS:5080)
for repeatability I would suggest limiting codec set to single one, either G.711a/u or G722; WebRTC AEC in version I'm using works only with 8 and 16 ksps sampling speed
if there would be echo, you would hear it on headphones, optionally you can record it

After my last post, I have done some tests. The results are as follows and so strange. Used module is winwave. usb hanset A : https://www.amazon.com/ALTEAM-Portable-Softphone-Lightweight-Microphone/dp/B07PRQRWXZ usb hanset B : https://www.amazon.com/Digitus-DA-70772-USB-Telephone-Handset/dp/B000FIH4FQ

I do believe that this type should not generate any significant echo - speaker should be not loud enough. I'm using myself EX-03 (https://tomeko.net/software/SIPclient/EX03.php) with handset (at least exact plastic shell) like your model B and with disabled AEC there are no traces of echo.

Here are few guesses / things worth checking:

microphone level could be extremely high (for my desk phone I had to set microphone volume to almost minimum, I believe this is a bug in Windows 10)
there might be some very agressive automatic gain control, sometimes it can be disabled; on my sound card it is assigned to speakers but I believe microphone is also affected
there could be something software-based that mixes played audio back to input (either "Stereo Mix" audio input or some setup involving virtual audio devices)

hillxrem commented 5 months ago

Thank you for your detailed advice. I have tried many things but have not been able to solve the problem. It may be that my PC is processing the audio in a special way. I have run out of time I can spend on this issue, so I will put it on hold, but will try to tackle it again when I get a chance.

tomek-o / tSIP

Is it possible to automatically adjust buffer time and skew value when using webrtc aec? #52