meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.17k stars 2.47k forks source link

Add insert of concealed packets #3308

Closed spscream closed 5 months ago

spscream commented 9 months ago

this changes were created to address #3297 for our environment. I'm not sure if it is fully correct, but it mostly fixes our problems.

januscla commented 9 months ago

Thanks for your contribution, @spscream! Please make sure you sign our CLA, as it's a required step before we can merge this.

lminiero commented 9 months ago

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

atoppi commented 9 months ago

Thanks @spscream for the effort!

My objection to the implementation is related to the timing of PLC packets generation.

FEC needs packet N+1 to decode packet N (so it will naturally introduce/leverage a playout delay), meanwhile PLC is only calculated from already decoded samples. According to this assumption, the generation of a PLC packet should be triggered in a condition where the bridge encoder "needs something from a participant but there's nothing available", and not after the arrival of a random packet not adjacent to the previous one.

Reusing FEC approach will inevitably lead to issues like unwanted delay, discarding of good packets, uncontrollable queue growth etc.

I still have to think about a better approach in detail, however one idea is to trigger a PLC whenever a participant's queue-in is empty (e.g. detected participant underflow). That means that a loss burst has exhausted both the jitter buffer and the participant's queue (decoded samples), hence the mixer has no to other chance but introduce concealment for the participant.

spscream commented 9 months ago

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

yes, I tried it and sound cracks

spscream commented 9 months ago

@spscream have you tried what happens if, instead of ingesting lost_packets concealed packets in that for loop, you add only one? My guess is that the end result should be the same, since it will try to recreate stuff from the last thing it received, and should have the added benefit of not inflating the queue as your patch would do now.

yes, I tried it and sound cracks

I can insert n packets of maximum size instead of number of loss packets, but plc should cover time period of lost bursts accordingly to docs. frame_size parameter of opus_decode has the following description:

Number of samples per channel of available space in pcm. If this is less than the maximum packet duration (120ms; 5760 for 48kHz), this function will not be capable of decoding some packets. In the case of PLC (data==NULL) or FEC (decode_fec=1), then frame_size needs to be exactly the duration of audio that is missing, otherwise the decoder will not be in the optimal state to decode the next incoming packet. For the PLC and FEC cases, frame_size must be a multiple of 2.5 ms.

spscream commented 9 months ago

I still have to think about a better approach in detail, however one idea is to trigger a PLC whenever a participant's queue-in is empty (e.g. detected participant underflow). That means that a loss burst has exhausted both the jitter buffer and the participant's queue (decoded samples), hence the mixer has no to other chance but introduce concealment for the participant.

This buffer underrun can explain why we still have this cracks/pops for our production users. But I afraid I can't rewrite it to such aproach myself or at least not this year.

lminiero commented 9 months ago

I agree with @atoppi that it's the road we should explore, even though it does come with its own challenges: for one, the mixer is the one who'd read from that queue, and it uses a different thread, meaning I'd prefer it not to be the one that should perform that PLC decode. As such, we'll have to better think of how to do that in the participant thread instead. This is something we can definitely discuss together after the vacations, no need to hurry and no need for you to actually implement it yourself either, if we manage to replicate the problem and come up with a potential fix.

spscream commented 9 months ago

@atoppi jitter_buffer_get has JITTER_BUFFER_MISSING and JITTER_BUFFER_INSERTION ret codes, so we can use it as a signal to insert plc, you can take it to considerations.

By the way I also discovered possible incorrect usage of JITTER_BUFFER_SET_MARGIN in current implementation - its value should be the same unit as the frame_size (e.g. JITTER_BUFFER_MIN_PACKETS * frame_size), since its used as timestamp under the hood, not as count of packets.

lminiero commented 9 months ago

its value should be the same unit as the frame_size

I don't think so. Besides, we're changing what actually ends in the buffer: not the payload anymore, but a pointer to a duplicate of the janus_plugin_rtp we got from the core.

spscream commented 9 months ago

its value should be the same unit as the frame_size

I don't think so. Besides, we're changing what actually ends in the buffer: not the payload anymore, but a pointer to a duplicate of the janus_plugin_rtp we got from the core.

I get the following situation with buffering if JITTER_BUFFER_SET_MARGIN is 2, the jitter buffer doesn't mantain its minimal buffer size:

[WARN] [18844549281349] jitter buffer avail: 0
[WARN] [18844549281349] jitter_buffer_get result 1
[WARN] [18844549281771] jitter buffer avail: 0
[WARN] [18844549281771] jitter_buffer_get result 1
[WARN] [18844549281829] jitter buffer avail: 2
[WARN] [18844549281829] jitter_buffer_get result 0
[WARN] [18844549300041] jitter buffer avail: 0
[WARN] [18844549300041] jitter_buffer_get result 1
[WARN] [18844549300112] jitter buffer avail: 2
[WARN] [18844549300112] jitter_buffer_get result 0
[WARN] [18844549302419] jitter buffer avail: 0
[WARN] [18844549302419] jitter_buffer_get result 1
[WARN] [18844549321048] jitter buffer avail: 2
[WARN] [18844549321048] jitter_buffer_get result 0
[WARN] [18844549321144] jitter buffer avail: 0
[WARN] [18844549321144] jitter_buffer_get result 1
[WARN] [18844549323703] jitter buffer avail: 0
[WARN] [18844549323703] jitter_buffer_get result 1

if I multiply JITTER_BUFFER_MIN_PACKETS by frame_size for JITTER_BUFFER_SET_MARGIN:

[WARN] [18844964419785] jitter_buffer_get result 0
[WARN] [18844964438318] jitter buffer avail: 3
[WARN] [18844964438318] jitter_buffer_get result 0
[WARN] [18844964459607] jitter buffer avail: 3
[WARN] [18844964459607] jitter_buffer_get result 0
[WARN] [18844964478355] jitter buffer avail: 3
[WARN] [18844964478355] jitter_buffer_get result 0
[WARN] [18844964499529] jitter buffer avail: 3
[WARN] [18844964499529] jitter_buffer_get result 0
[WARN] [18844964518172] jitter buffer avail: 3
[WARN] [18844964518172] jitter_buffer_get result 0
[WARN] [18844964539348] jitter buffer avail: 3
[WARN] [18844964539348] jitter_buffer_get result 0
[WARN] [18844964557793] jitter buffer avail: 3
[WARN] [18844964557793] jitter_buffer_get result 0
[WARN] [18844964578942] jitter buffer avail: 3
[WARN] [18844964578942] jitter_buffer_get result 0
[WARN] [18844964600042] jitter buffer avail: 3
[WARN] [18844964600042] jitter_buffer_get result 0
[WARN] [18844964618636] jitter buffer avail: 3
[WARN] [18844964618636] jitter_buffer_get result 0
[WARN] [18844964639613] jitter buffer avail: 3
[WARN] [18844964639613] jitter_buffer_get result 0
atoppi commented 8 months ago

By the way I also discovered possible incorrect usage of JITTER_BUFFER_SET_MARGIN in current implementation - its value should be the same unit as the frame_size (e.g. JITTER_BUFFER_MIN_PACKETS * frame_size), since its used as timestamp under the hood, not as count of packets.

@spscream yeah you're probably right, the comments in the speexdsp code are probably outdated/misleading. Internally the macro sets a buffer_margin variable that is used as a RTP timestamp.

spscream commented 8 months ago

I've tried another approach here and it works in our case even with huge loses: https://github.com/meetecho/janus-gateway/compare/master...spscream:janus-gateway:fix/am/audiobridge-cracks-fix-rnnoise#diff-476cfd26c102e4b3311b3b8e8f848bfeba283c1712a5f3f117254e0593a986f4R8959

@atoppi could you please look on it? If it is ok, i could refactor it and make another pr for you. Idea is the following:

I also tried to insert fec, but it seams it also leads to cracks

atoppi commented 8 months ago

@spscream the diff is huge and includes unrelated changes, can you please provide a more readable patch?

spscream commented 8 months ago

@atoppi https://github.com/meetecho/janus-gateway/commit/f5361e74520ae80be09504781237c16bfeae5340#diff-476cfd26c102e4b3311b3b8e8f848bfeba283c1712a5f3f117254e0593a986f4R8745 sure, here is commit with added changes

spscream commented 7 months ago

@atoppi hi! any thoughts?

spscream commented 7 months ago

If it is ok, I could make a new pr with my latest changes if it will be more convenient

atoppi commented 7 months ago

@spscream If you're able to propose a cleaner version of the patch it would greatly help thanks. I'll try to review this in the next few days.