SIPREC Re-Invite causes time to time problems with mixed audio file - multiple SSRC causes missing parties voice

sipwise / rtpengine

The Sipwise media proxy for Kamailio

GNU General Public License v3.0

787 stars 370 forks source link

SIPREC Re-Invite causes time to time problems with mixed audio file - multiple SSRC causes missing parties voice #1018

Closed IdoMagor closed 4 years ago

IdoMagor commented 4 years ago

Hello,

I'm currently using RTPEngine as an entity to record a SIPREC session. While that session is being initiated and the called client is answering, a SIP ReInvite is sent to the SIPREC server endpoint, which has RTPEngine as well, as you might guess it's responsible to take both parties(callee and called clients) to save their call inside a mixed .wav file. The actual ReInvite doesn't have anything special or updated inside it.

I'm starting to have problems regarding the mixed file which is having disturbance for the called client side. The issue is receiving both clients RTP to my SIPREC machine as single RTP's but the mixing process is somehow making the called client voice be missing from the mixed file.

When I looked at the syslog file, I saw a normal session is being setup and deleted after the call was finished, and with no error logs.

Are there any logs regarding errors, or even ffmpeg errors about mixing the voice of both parties?

Thanks! :)

IdoMagor commented 4 years ago

I've found out that when there's more than 4 SSRC of RTP inside the call, that makes the SSRCs to "race" between them who gets inside the mixing of the call. Inside mix.c there's the #define NUM_INPUTS 4. By any chance, is that variable deciding for the mixing of the rtpengine-recording daemon to mix only 4 first SSRCs(RTPs)?

rfuchs commented 4 years ago

Yes, that's exactly what it does. The ffmpeg amix filter needs to know ahead of time how many inputs there will be.

IdoMagor commented 4 years ago

@rfuchs So basically if I would have for a call multiple SIP ReInvite which will make a new dialogs(results in new SSRCs) it will make new SSRCs and they would be missing from the mix?

If it does, is there something that could be done in order to prevent this? The question if I'll just configure the NUM_INPUTS to 100, would it affect the end result recording of the mix in terms of bad quality?

rfuchs commented 4 years ago

That probably wouldn't work well because amix decreases the volume of each input according to how many there are in order to avoid clipping in the output, so with 100 inputs you wouldn't be left with much to listen to. There would also be a performance hit since rtpengine has to synthesise silence for each audio input that doesn't currently produce audio. So for now I'm afraid that this use case simply isn't supported.

IdoMagor commented 4 years ago

@rfuchs Does RTPEngine has a future task to make the num of inputs to be dynamic accordingly with the number of open RTP session?

rfuchs commented 4 years ago

No current plans to do so as this is a limitation of the amix filter we use to mix the audio. Supporting more inputs dynamically would require moving to a different mechanism, which isn't entirely trivial.

IdoMagor commented 4 years ago

@rfuchs Well for now I do wish to make the num inputs to be a little higher, but as I can already see or more correctly hear, the voice is getting a lot quieter for putting the NUM_INPUTS to 10. Is there a way to add more gain/volume for the inputs?

Or maybe fast manipulate of the code to make only writing the last 2 RTP SSRCs?

rfuchs commented 4 years ago

Again this is a limitation of the amix filter. You'd have to add an extra gain/volume filter to compensate for the decreased volume.

Removing unused SSRCs from the mixing path to make room for new ones would be a better solution. In fact I've been meaning to implement something like for a while now...

IdoMagor commented 4 years ago

Hello again,

I wanted to ask 2 questions regarding this whole scenario.

Inside mix.c(recording-daemon), there's the function mix_silence_fill_idx_upto which is called for the mix_add function. Is there a chance you can elaborate on what it does?
The addition of silence in terms of rtpengine to each ssrc, if we would remove that functionality, will it make the output generated from FFmpeg to be not that quieter (if we configured NUM_INPUTS=10 for example).

Thx! :)

rfuchs commented 4 years ago

The amix filter needs to see audio coming from each configured input before it produces any output. So e.g. with 5 configured inputs, but only 2 SSRCs actually sending audio, rtpengine needs to synthesise silence for the remaining 3 inputs, otherwise amix won't produce any output as it keeps waiting for audio on those 3 inputs. That's what this function does. (It also handles timestamp gaps from actually present SSRCs.)
No because if you do that, amix won't produce any output. At least I haven't found a way to make it not wait for audio on an "unconnected" input. If you can find a way, then I would be very happy to eliminate this function.

IdoMagor commented 4 years ago

@rfuchs Just wanted to say thanks for your support :)