Closed hillxrem closed 5 months ago
To be on the same page:
Skew parameter is supposed to compensate clock difference between two audio devices (different real sampling rates). I don't know how important this is in a real life, but I've included it as it was exposed by WebRTC and some other implementations.
Second parameter, "ms in sound card buffer" is supposed to be estimation (more precisely: it should be little lower than real value) of total delay of softphone audio API, OS and sound card, sum of both directions. For portaudio module these delays are partially configurable (100 ms in each direction by default), for winwave/winwave2 I'm expecting about 60 ms in each direction (hence 120 ms as default delay for AEC).
The values you are using (220 ... 260 ms) seem oddly high to me. This should also not depend on what number is calling/called, that's why I'm wondering if you really mean local echo.
From my testing the biggest problem for local echo cancellation is audio processing / audio enhancements done by some audio cards. Somewhere in the control panel there might be options to disable it.
For testing AEC I might recommend enabling recording in stereo, calling either sip:3333@sip2sip.info or sip:music@iptel.org and then opening recording file with Audacity.
Thank you for explaining the meaning of the two values and of "local echo". The meaning of the "local echo" is as shown in the figure below, right?
In my office , it is not allowed to connect my laptop to the internet, so I tested like this : First I make a call from tSIP. Next, go to the soundproof room next door and pick up called phone(mobile phone or landline phone). Finally, I say "Hello" out loud and test if I get an echo back.
After my last post, I have done some tests. The results are as follows and so strange. Used module is winwave. usb hanset A : https://www.amazon.com/ALTEAM-Portable-Softphone-Lightweight-Microphone/dp/B07PRQRWXZ usb hanset B : https://www.amazon.com/Digitus-DA-70772-USB-Telephone-Handset/dp/B000FIH4FQ usb speakerphone(AEC built in) : https://www.amazon.com/Bluetooth-Speakerphone-Microphone-Reduction-Algorithm/dp/B08DNTXYCT/
As you say, the set value for canceling local echo should not vary depending on the destination of the call. I think something is wrong, but I have no idea what is going on.
PS Zoiper cancelled all handset echoes, but I'd like to use sip client whitch can be fully scripted.
Thank you for explaining the meaning of the two values and of "local echo". The meaning of the "local echo" is as shown in the figure below, right?
You are correct.
In my office , it is not allowed to connect my laptop to the internet, so I tested like this : First I make a call from tSIP. Next, go to the soundproof room next door and pick up called phone(mobile phone or landline phone). Finally, I say "Hello" out loud and test if I get an echo back.
This sounds like a lot of work. I might suggest other setup:
After my last post, I have done some tests. The results are as follows and so strange. Used module is winwave. usb hanset A : https://www.amazon.com/ALTEAM-Portable-Softphone-Lightweight-Microphone/dp/B07PRQRWXZ usb hanset B : https://www.amazon.com/Digitus-DA-70772-USB-Telephone-Handset/dp/B000FIH4FQ
I do believe that this type should not generate any significant echo - speaker should be not loud enough. I'm using myself EX-03 (https://tomeko.net/software/SIPclient/EX03.php) with handset (at least exact plastic shell) like your model B and with disabled AEC there are no traces of echo.
Here are few guesses / things worth checking:
Thank you for your detailed advice. I have tried many things but have not been able to solve the problem. It may be that my PC is processing the audio in a special way. I have run out of time I can spend on this issue, so I will put it on hold, but will try to tackle it again when I get a chance.
I am adjusting the values of the webrtc aec in settings, but the values of buffer time and skew that can completely eliminate local echo seem to be different for different destination for some reason. For example, when I call a mobile phone from tSIP, I can completely eliminate the echo with buffer time 220, but not the echo of a landline. Conversely, when I call a landline from tSIP, I can completely eliminate the echo with buffer time 260, but not the echo of the mobile phone.
I would like to know the meanings and the way to adjust these values(buffer time , skew), and would it be difficult to implement a function that compares the audio input and output and automatically adjusts these values?