Closed aankur closed 2 months ago
Given that the only commits between v3.2.24 and v3.2.25 are RTX related (https://github.com/pion/webrtc/compare/v3.2.24...v3.2.25), I'm guessing it has something to do with RTX. This would make sense, as when RTX is negotiated, libwebrtc will attempt to probe with RTX packets instead of padding packets.
I think what might be going on is that if RTX gets negotiated by default (I'm not sure whether this is the case now or not), and somehow RTX packets are not consumed, then RTX packets will stop being fed through the interceptor and libwebrtc won't get TWCC feedback for RTX probes. In https://github.com/pion/webrtc/commit/5da72784c8d9425fe61bbf66de7b658eb94d6717 repairStreamChannel
is an unbuffered channel, meaning that if it is not read from in readRTX
then the goroutine in receiveForRTX
will be blocked. I'm not that familiar with the RTX code though, so perhaps @adriancable or @cnderrauber could chime in and correct anything I got wrong here.
If the general shape of my theory is correct, that would definitely explain the behavior you are seeing with libwebrtc's bandwidth estimation.
@kcaffrey - what you write sounds correct, although I'm not sure the issue is RTX related. Note I have also seen similar things to @aankur (slow BWE on the Chrome side) but I saw this a while back before I added the RTX code. From memory, I have seen it sometimes be slow and sometimes not with no obvious cause.
@aankur - just to exclude this, can you try it without RTX, i.e. instead of registering the default codecs adding just H264 for example without adding RTX, then see if the issue still occurs. You may need to run the experiment e.g. 10 times to rule out the coincidence effect.
One other cause I've seen (which you may have ran into @adriancable) is that if you are using simulcast, the first packet is dropped by Pion (see https://github.com/pion/webrtc/pull/2777). If libwebrtc sees the first packet it sends is lost, it enters a mode where the bandwidth estimate tends to not grow fast (for reasons I'm not 100% sure about). In our own internal testing, we saw that much of the time, libwebrtc happened to send an audio packet first (meaning the issue was not hit), but if it happened to send a simulcast video packet first, then the problem would occur.
So to further rule out possible causes, you can try making sure simulcast is not enabled (if it was on the client)
@kcaffrey - also to add, it isn't really possible 'not to consume' RTX packets. In the implementation that landed, track.Read() returns both RTX and non-RTX packets (since most consumers don't need to know the difference), with the attributes being set to indicate if it's RTX (if the consumer does need to know).
@aankur - are you able to get a WebRTC packet trace and post it here for us to look at? If you go to chrome://webrtc-internals there's now an option to 'create diagnostic packet recordings'. After turning this on you'll need to restart the stream (so we can see everything from the start) and then locate the right log file and attach it here.
@kcaffrey / @aankur - there's one other thing to bear in mind if we are looking at differences with RTX and without. A recent change in libwebrtc (which may or may not have landed in Chrome M126) starts sending BWE probes before the tracks start, on SSRC 0, if RTX is enabled. Previously Pion didn't handle these and so it looked to the sender like these packets were lost, which would make the initial BWE hit the floor until it caught up via subsequent probes. This would have a similar effect to what @aankur reported, but it was fixed in #2816. (I did test this change, and it seems to do the right thing, but it's possible there's some edge case it's not handling right.) Pointing this out because if @aankur finds that their issue happens only with RTX enabled, does not necessarily mean the issue is with Pion's RTX handling, even if the issue is on the Pion side.
I'll be able to see whether this is what's happening from the Chrome WebRTC packet trace.
Hi, i am not using Simulcast, have run the tests as requested. i also debugged through the code to see if SSRC 0 was being handled, the version chrome i am using does not do that yet.
v4.0.0-beta.24 without RTX Bandwidth Ramps up Faster than with-RTX but slower than V3.2.22, which has the same steps but narrower slope i.e first jumps up fast to 1.5 takes some time jumps to 2 takes some time jumps to 2.5, does not see-saw
Without RTX SDP Answer
Without RTX Bandwidth Ramp-UP
Without RTX Packet dump event_log_nortx.log20240725_0007_68_4.log
With RTX and TWCC Packet Dump event_log_with_rtxtwcc.log20240725_0011_68_5.log
@aankur - in your original post, for pion v3, you mentioned v3.2.49, then said it happens in v3.2.25 but not v3.2.24, then above you mention v3.2.22. That's a lot of versions! Can you just tell us two, the last version that works and the first version which doesn't?
It does look like the 'with RTX' case is the one that's problematic. In which case the difference between v3.2.24 and v3.2.25 is explained by the fact that v3.2.24 does not support RTX. It isn't clear to me this is a Pion issue vs. a difference in how Chrome does BWE in the RTX vs non-RTX cases (they are quite different, in the non-RTX case you get padding probes on the main track, in the RTX case you don't get padding probes, instead you get periodic retransmissions on the RTX track which also serve as BWE probes). It is quite possible that the different numbers (and nature) of the probes you get from Chrome in the two cases make it take longer or quicker for Chrome to do the BWE.
Can you clarify the topology here? You are sending video from the Chrome side and receiving from the Pion side, correct?
In this case, I would look at what happens between Chrome and Chrome in the RTX and non-RTX cases. So take Pion out of the equation and just see if it's a 'natural' RTX vs non-RTX effect.
Hi, Sorry for the confusion, do you want me to run tests on V3.2.25 , i have run all tests on v3.2.49 i then ran tests on all released V3.2 versions prior to that and narrowed down to the release that first displayed the issue V3.2.25, i extensively verified that V3.2.24 did not have this issue, i was most recently testing MediaMTX with uses V3.2.22 and its Target Bitrate were excellent, V3.2.22 and V3.2.24 both are similar, but all the reports that you see generated belong to v3.2.49, including the parent issue.
i am sending data from Chrome to Pion (Video Only/Not Simulcast)
This Particular Behaviour exists on all Pion Versions from V3.2.25. All These tests where done on https://github.com/pion/webrtc/blob/master/examples/broadcast/main.go as a publisher
@aankur - please try sending video from Chrome to Chrome in the RTX vs non-RTX case, and then look at the targetBitrate graph for the sender Chrome in both cases. My guess is that what you are seeing is the difference in how Chrome (actually libwebrtc) does bandwidth estimation in the RTX vs non-RTX case, and that it isn't a Pion issue.
v3.2.24 and v3.2.25 are different only because the former doesn't support RTX. There are no other changes (as I understand it), so this is consistent with this being a Chrome-side difference in how BWE is done in the RTX vs non-RTX case.
Sample HTML Demonstrating Chrome -> Chrome (Video Only)
With RTX
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div id="container">
<video id="localVideo" playsinline autoplay muted></video>
<video id="remoteVideo" playsinline autoplay></video>
<div class="box">
<button id="startButton">Start</button>
<button id="callButton">Call</button>
<button id="hangupButton">Hang Up</button>
</div>
</div>
<script>
'use strict';
const startButton = document.getElementById('startButton'); const callButton = document.getElementById('callButton'); const hangupButton = document.getElementById('hangupButton'); callButton.disabled = true; hangupButton.disabled = true; startButton.addEventListener('click', start); callButton.addEventListener('click', call); hangupButton.addEventListener('click', hangup);
let startTime; const localVideo = document.getElementById('localVideo'); const remoteVideo = document.getElementById('remoteVideo');
localVideo.addEventListener('loadedmetadata', function() {
console.log(Local video videoWidth: ${this.videoWidth}px, videoHeight: ${this.videoHeight}px
);
});
remoteVideo.addEventListener('loadedmetadata', function() {
console.log(Remote video videoWidth: ${this.videoWidth}px, videoHeight: ${this.videoHeight}px
);
});
remoteVideo.addEventListener('resize', () => {
console.log(Remote video size changed to ${remoteVideo.videoWidth}x${remoteVideo.videoHeight} - Time since pageload ${performance.now().toFixed(0)}ms
);
// We'll use the first onsize callback as an indication that video has started
// playing out.
if (startTime) {
const elapsedTime = window.performance.now() - startTime;
console.log('Setup time: ' + elapsedTime.toFixed(3) + 'ms');
startTime = null;
}
});
let localStream; let pc1; let pc2;
function getName(pc) { return (pc === pc1) ? 'pc1' : 'pc2'; }
function getOtherPc(pc) { return (pc === pc1) ? pc2 : pc1; }
async function start() {
console.log('Requesting local stream');
startButton.disabled = true;
try {
const stream = await navigator.mediaDevices.getUserMedia({video: {width: 1280, height: 720}});
console.log('Received local stream');
localVideo.srcObject = stream;
localStream = stream;
callButton.disabled = false;
} catch (e) {
alert(getUserMedia() error: ${e.name}
);
}
}
async function call() {
callButton.disabled = true;
hangupButton.disabled = false;
console.log('Starting call');
startTime = window.performance.now();
const videoTracks = localStream.getVideoTracks();
if (videoTracks.length > 0) {
console.log(Using video device: ${videoTracks[0].label}
);
}
const configuration = {};
console.log('RTCPeerConnection configuration:', configuration);
pc1 = new RTCPeerConnection(configuration);
console.log('Created local peer connection object pc1');
pc1.addEventListener('icecandidate', e => onIceCandidate(pc1, e));
pc2 = new RTCPeerConnection(configuration);
console.log('Created remote peer connection object pc2');
pc2.addEventListener('icecandidate', e => onIceCandidate(pc2, e));
pc1.addEventListener('iceconnectionstatechange', e => onIceStateChange(pc1, e));
pc2.addEventListener('iceconnectionstatechange', e => onIceStateChange(pc2, e));
pc2.addEventListener('track', gotRemoteStream);
localStream.getTracks().forEach(track => { pc1.addTransceiver(track, {direction: 'sendonly'}) }); console.log('Added local stream to pc1');
try { console.log('pc1 createOffer start'); const offer = await pc1.createOffer(); await onCreateOfferSuccess(offer); } catch (e) { onCreateSessionDescriptionError(e); } }
function onCreateSessionDescriptionError(error) {
console.log(Failed to create session description: ${error.toString()}
);
}
async function onCreateOfferSuccess(desc) {
console.log(Offer from pc1\n${desc.sdp}
);
console.log('pc1 setLocalDescription start');
try {
await pc1.setLocalDescription(desc);
onSetLocalSuccess(pc1);
} catch (e) {
onSetSessionDescriptionError();
}
console.log('pc2 setRemoteDescription start'); try { await pc2.setRemoteDescription(desc); onSetRemoteSuccess(pc2); } catch (e) { onSetSessionDescriptionError(); }
console.log('pc2 createAnswer start'); // Since the 'remote' side has no media stream we need // to pass in the right constraints in order for it to // accept the incoming offer of audio and video. try { const answer = await pc2.createAnswer(); await onCreateAnswerSuccess(answer); } catch (e) { onCreateSessionDescriptionError(e); } }
function onSetLocalSuccess(pc) {
console.log(${getName(pc)} setLocalDescription complete
);
}
function onSetRemoteSuccess(pc) {
console.log(${getName(pc)} setRemoteDescription complete
);
}
function onSetSessionDescriptionError(error) {
console.log(Failed to set session description: ${error.toString()}
);
}
function gotRemoteStream(e) { remoteVideo.srcObject = new MediaStream([e.track]); console.log('pc2 received remote stream'); }
async function onCreateAnswerSuccess(desc) {
console.log(Answer from pc2:\n${desc.sdp}
);
console.log('pc2 setLocalDescription start');
try {
await pc2.setLocalDescription(desc);
onSetLocalSuccess(pc2);
} catch (e) {
onSetSessionDescriptionError(e);
}
console.log('pc1 setRemoteDescription start');
try {
await pc1.setRemoteDescription(desc);
onSetRemoteSuccess(pc1);
} catch (e) {
onSetSessionDescriptionError(e);
}
}
async function onIceCandidate(pc, event) {
try {
await (getOtherPc(pc).addIceCandidate(event.candidate));
onAddIceCandidateSuccess(pc);
} catch (e) {
onAddIceCandidateError(pc, e);
}
console.log(${getName(pc)} ICE candidate:\n${event.candidate ? event.candidate.candidate : '(null)'}
);
}
function onAddIceCandidateSuccess(pc) {
console.log(${getName(pc)} addIceCandidate success
);
}
function onAddIceCandidateError(pc, error) {
console.log(${getName(pc)} failed to add ICE Candidate: ${error.toString()}
);
}
function onIceStateChange(pc, event) {
if (pc) {
console.log(${getName(pc)} ICE state: ${pc.iceConnectionState}
);
console.log('ICE state change event: ', event);
}
}
function hangup() { console.log('Ending call'); pc1.close(); pc2.close(); pc1 = null; pc2 = null; hangupButton.disabled = true; callButton.disabled = false; }
Your environment.
What did you do?
Used the Broadcast Example, created a new RegisterDefaultInterceptors without webrtc.ConfigureTWCCSender
replaced
What did you expect?
Bandwidth Ramp-up should be fast with webrtc.ConfigureTWCCSender
What happened?
looks like introduced in v3.2.25 when using webrtc.ConfigureTWCCSender the rampup was slow and in a sea-saw pattern and got stuck at 1 mbps when not using it it was still slow but went upto 2.5 mbps, please see the Target Bitrate as reported on the graph
With ConfigureTWCCSender , Slow, Seesaw and stuck at 1mbps
Without ConfigureTWCCSender, slow but goes upto 2.5mbps
With ConfigureTWCCSender in v3.2.24