meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
7.99k stars 2.45k forks source link

Video stream not connection with spec-compliant simulcast on Chrome M74+ #1552

Closed bwerther closed 5 years ago

bwerther commented 5 years ago

Hi Lorenzo,

I'm not sure if this should be working yet, but your recent merging of the simulcast-rid branch into master suggests it should.

I'm using the latest master (as of today), and both Chrome beta.M74 and canary.M75.

I've modified the janus.js addTrack code to do spec-compliant simulcast initiation:

        if(addTracks && stream !== null && stream !== undefined) {
            Janus.log('Adding local transceivers and stream');

            var audioTransceiver = config.pc.addTransceiver("audio");
            var videoTransceiver = config.pc.addTransceiver("video",{
                sendEncodings:[
                    { rid: "h", active: true, maxBitrate: 900000 },
                    { rid: "m", active: true, maxBitrate: 300000, scaleResolutionDownBy: 2 },
                    { rid: "l", active: true, maxBitrate: 100000, scaleResolutionDownBy: 4 }
                ]
            });
            stream.getAudioTracks().forEach(function(track) {
                Janus.log('Adding local audio track:', track);
                audioTransceiver.sender.replaceTrack(track);
            });
            stream.getVideoTracks().forEach(function(track) {
                Janus.log('Adding local video track:', track);
                videoTransceiver.sender.replaceTrack(track);
            });
        }

Without the sendEncodings block, things work as they always have. With it, we see the new SDP with three RIDs and no SSRCs. The offer/answer exchange to the videoroom (or echotest) goes fine, and the ICE negotiation starts. Audio connects, but but video doesn't and the server monitor shows no inbound video traffic.

Any idea what might be going on?

Thanks, Ben

lminiero commented 5 years ago

My guess is you've put the code in the wrong place. I've used this snippet to replace this line for my tests:

if(track.kind === "audio") {
    config.pc.addTrack(track, stream);
} else {
    config.pc.addTransceiver(track, {
        direction: "sendrecv",
        streams: [stream],
        sendEncodings: [
            { rid: "h", active: true, maxBitrate: 900000 },
            { rid: "m", active: true, maxBitrate: 300000, scaleResolutionDownBy: 2 },
            { rid: "l", active: true, maxBitrate: 100000, scaleResolutionDownBy: 4 }
        ]
    });
}

and it did the trick. Still not working exactly as expected (I only ever see 2 or the 3 substreams), but the rid stuff does work. I will attend the IETF Hackathon in a few days, so I'll debug the simulcast stuff with other SFU developers and browsers implementors there.

lminiero commented 5 years ago

Or maybe the difference is in how you create the transceiver? type+encodings first, and track only after that.

bwerther commented 5 years ago

Thanks -- this was very helpful. I did the following and it started working (with all 3 simulcast streams). Looks like the key was calling addTransceiver with the track itself, rather than calling replaceTrack after creation.

Btw, the only issue now is that, when I switch the substream, it freezes for a 2-3 seconds and the video sometimes breaks up into a blocky mess (particularly when switching to the highest quality substream) before correcting itself.

        if(addTracks && stream !== null && stream !== undefined) {
            Janus.log('Adding local transceivers and stream');

            var audioTrack = stream.getAudioTracks()[0];
            if (audioTrack) {
                Janus.log('Adding local audio track:', audioTrack);
                config.pc.addTrack(audioTrack, stream);
            } else
                Janus.log('No local audio track found');

            var videoTrack = stream.getVideoTracks()[0];
            if (videoTrack) {
                Janus.log('Adding local video track:', videoTrack);
                var videoTransceiver = config.pc.addTransceiver(videoTrack, {
                    direction: "sendrecv",
                    streams: [stream],
                    sendEncodings: [
                        { rid: "h", active: true, maxBitrate: 900000 },
                        { rid: "m", active: true, maxBitrate: 300000, scaleResolutionDownBy: 2 },
                        { rid: "l", active: true, maxBitrate: 100000, scaleResolutionDownBy: 4 }
                    ]
                });
                Janus.log("Created local video transceiver:", videoTransceiver);
            } else
                Janus.log('No local video track found');
        }
lminiero commented 5 years ago

Sorry, no time to look into this until the end of the month. If it turns up during the hackathon we'll see what causes it.

bwerther commented 5 years ago

Thanks Lorenzo.

On Fri, Mar 22, 2019 at 09:49 Lorenzo Miniero notifications@github.com wrote:

Sorry, no time to look into this until the end of the month. If it turns up during the hackathon we'll see what causes it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/meetecho/janus-gateway/issues/1552#issuecomment-475694885, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0MVhUusDLCKLQBfGTU2RvKA0nllrX8ks5vZQmAgaJpZM4cAV67 .

lminiero commented 5 years ago

You can try the new simulcast2=true query string attribute in some of our demos (e.g., EchoTest, VideoRoom) to enable rid-based simulcasting in Chrome using janus.js: I just added it here, https://github.com/meetecho/janus-gateway/commit/0ba905deb50767d6332a6b4c8cb3163c0746b133

Just tested myself and I do get all layers, and apparently it works just as munging SDP does. That said, I only made a couple of tests locally and with a server that's relatively close. Again, more tests tomorrow at the hackathon may give us more info, but for now I consider the feature working as expected.

bwerther commented 5 years ago

Thanks Lorenzo. I suspect there might be some problem upstream with the FIR processing, but we can wait till you put it through its paces in the hackathon. Also perhaps delaying the switchover till the first iframe is seen on the new su stream?

On Fri, Mar 22, 2019 at 11:17 Lorenzo Miniero notifications@github.com wrote:

You can try the new simulcast2=true query string attribute in some of our demos (e.g., EchoTest, VideoRoom) to enable rid-based simulcasting in Chrome using janus.js: I just added it here, 0ba905d https://github.com/meetecho/janus-gateway/commit/0ba905deb50767d6332a6b4c8cb3163c0746b133

Just tested myself and I do get all layers, and apparently it works just as munging SDP does. That said, I only made a couple of tests locally and with a server that's relatively close. Again, more tests tomorrow at the hackathon may give us more info, but for now I consider the feature working as expected.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/meetecho/janus-gateway/issues/1552#issuecomment-475727342, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0MVpJ_T4M_JTEvh6vA75UzyENXLuj3ks5vZR4jgaJpZM4cAV67 .

lminiero commented 5 years ago

We already wait for the keyframe.

bwerther commented 5 years ago

I found the problem. In janus_h264_is_keyframe it should return true for either the IDR (5) or SPS (7). It was checking for both of these in the case of a fragmentation unit, but only for the IDR in the case of the single unit. Adding the check for (fragment==7) gets rid of the substream-switching issues I was seeing. Here's the code:

if ((fragment == 5) || (fragment == 7)) {
        return TRUE;    
    } else if ((fragment == 28 || fragment == 29) && (nal == 5 || nal == 7) && start_bit == 128) {
        JANUS_LOG(LOG_HUGE, "Got an H264 keyframe from a fragmentation unit \n");
        return TRUE;
    } else if(fragment == 24) {
lminiero commented 5 years ago

Thanks, just fixed upstream!

lminiero commented 5 years ago

@bwerther can you make some tests removing the two ==5 checks and only leaving ==7? Not sure if we need the start_bit check either. Asking as during the hackathon and testing against Safari, we noticed artifacts when doing H.264 simulcasting, which may have to do with the way we detect the keyframe detection: considering SPS/PPS packets may be sent in separate packets than the ones with NAL=5 (before, precisely), there may be race conditions where we start our checks right after NAL=7 has passed, and so we do the switch on NAL=5 which would be missing the SPS/PPS info. Right now this is just speculation (we're making a few tests on that), but checking whether that works for you would help.

lminiero commented 5 years ago

Just FYI, I committed the fix here https://github.com/meetecho/janus-gateway/commit/86a00fe94879478197e66754503a4ba7757e9a06 as it seemed to do the trick in our tests. Please let me know if you encounter issues with that and we'll revisit.

bwerther commented 5 years ago

Thanks Lorenzo, I played with it a little, and I think I understand what is going on a little better, and I don't understand a couple of things going on.

If we can find a way to get that initial SPS right away, then just using NAL=7 for substream switches is probably fine. If we can't then the other option would be to cache the SPS info from all substreams, and pass it out of band so that all subscribers can just use NAL=5 keyframes.

On Sun, Mar 24, 2019 at 3:46 AM Lorenzo Miniero notifications@github.com wrote:

Just FYI, I committed the fix here 86a00fe https://github.com/meetecho/janus-gateway/commit/86a00fe94879478197e66754503a4ba7757e9a06 as it seemed to do the trick in our tests. Please let me know if you encounter issues with that and we'll revisit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meetecho/janus-gateway/issues/1552#issuecomment-475947246, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0MVjCjEvGSurwdM9MHZcn3ipGQ8UKCks5vZ1dpgaJpZM4cAV67 .

lminiero commented 5 years ago

In WebRTC, a keyframe always includes a SPS AFAIK, so the fix we made should cover that, since we don't switch streams in simulcast until we get a keyframe for the new stream. Now we only use nal=7 (sps/pps) as an indication for a keyframe, and ignore 5.

The fact you only see normal quality at the beginning is normal, I think. The browser bandwidth estimator may take a bit to figure out it has enough bandwidth for all streams, and until that happens only lower layers (0, probably 1) are sent. This is assuming you configured the VideoRoom enabling the transport wide stuff, and that you're not capping the bitrate (so you either set 0 or a high bitrate value).