Closed vpoddubchak closed 4 years ago
Based on the graphs it clearly shows that we might be messing up sequence numbers for VP9. I guess you don't see the same behavior when using VP8, right?
Another question, are you enabling simulcast or some other functionality?
With VP8 all is fine. No simulcast. In basic example I just changed resolution and bitrate (720p 1.5Mbit).
@jcague what is the workaround for this?
@jcague what is the workaround for this?
The workaround is to use VP8 :) We are focused on VP8 and that's what we use in our deployments. While VP9 is interesting and we have things in place to make it work, I wouldn't avise using it in production at this time with Licode. That said, all the reports are welcome and we will use them to know what can be failing next time we prioritize work and decide to spend some time tweaking it.
We start seeing same problem with VP8 simulcast. Reproduced on pure licode, deployed in docker. with simulcast=true option. Changed only default and max bandwidth: (768kbps and 3000kbps) Problem happens time to time. Video freezes to 10 sec on receiver side. (in the same time, other subscribers see good video):
Looks very similar to problem with VP9. Let me know if you need more information.
@vpoddubchak what version of Chrome are you using? Does it happen to you also in old versions? I know Google recently (v80) changed some internal things in Simulcast, like the number of temporal layers, and that might be causing issues in the way we handle quality layer switches.
I'm using Chrome v80. We did not see such problem in previous versions. And to prove it I did such test in Chrome v78 and Chrome v80:
Results for Chrome 78:
Results for Chrome 80:
As you can see, Chrome 78 reacts much better - no drops to 0 bitrate.
Is it possible to fix on licode side ?
yes, I think that change is affecting Licode, but we might be doing things wrong in Licode because it shouldn't make the videos freeze I think. We will work on this in the following days/weeks.
I have reproduced it on Chrome 78 on Ubuntu 18. Possible reason why it is harder to reproduce is previous bug when video bitrate is constantly ~ 150kbps, so packet lost did not affect it much. Also I investigate it a bit, this is what I know for now:
video_fraction_lost
is more than kHighVideoFractionLostThreshold
(=20 * 256 / 100) and set level=ConnectionQualityLevel::HIGH_LOSSES
next_temporal_layer=0
and below_min_layer = true
which lead to switch-ON slide-show modeLogs:
[erizo-310d03ec-08a1-d397-8a51-93398d3f4886] 2020-02-28 10:46:14,303 - ERROR [0x7f2545df1700] bandwidth.ConnectionQualityCheck - ================== video_fraction_lost = 84
[erizo-310d03ec-08a1-d397-8a51-93398d3f4886] 2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - ==================onConnectionQualityUpdate 0
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - message: Calculate best layer, estimated_bitrate: 391110, current layer 0/2, min_requested_spatial 0
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - Bitrate for layer 0/0 81381
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - Bitrate for layer 0/1 134946
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - Bitrate for layer 0/2 134946
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - message: below_min_layer 1, freeze_fallback_active_: 0
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - message: Setting slideshow fallback, below_min_layer 1, spatial_layer 0,next_spatial_layer 0 freeze_fallback_active_: 1, min_requested_spatial_layer: 0,slideshow_below_spatial_layer_ -1
[erizo-310d03ec-08a1-d397-8a51-93398d3f4886] 2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - message: Layer Switch, current_layer: 0/2, new_layer: 0/0
2020-02-28 10:46:14,303 - DEBUG [0x7f25455f0700] rtp.QualityManager - message: Is padding enabled, padding_enabled_: 0
Questions:
That is a mechanism to better respond to high packet losses, so it's not a bug anyway. You can even disable it but we don't recommend it. Instead, you can use the event we generate in the client side to let the user know he has some connectivity issues.
As I said before this is a feature in Licode to better send traffic through lossy networks, so I close this issue.
Description
When we use VP9 codec, we constantly see video freezes during session. It happens also on pure licode with minor changes (codec and bitrate)
Steps to reproduce the issue:
Describe the results you received: One of participant see video freezes on remote video
From webrtc-internals I see that bitrate drops to 0 on receiver side (as you can see packets lost not always a reason of that): then it start sending PLIs and NACKs: Each time, after 4th PLI it recovers.
At the same time, on publisher side bandwidth is stable. Count of received PLIs increases by 2. Looks like SFU is waiting for PLI pair and then send it to publisher. Also no NACKs at all: