meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.15k stars 2.47k forks source link

Frame freeze with Chrome #233

Closed benwtrent closed 9 years ago

benwtrent commented 9 years ago

I am using the video room logic and streaming HiRes video at a medium framerate(15 fps). and Periodically, if there is a ton of movement, the images on the stream will freeze. Packets are still being sent and relayed, but it would seem that Chrome is not listening to the Fir requests? The symptoms seem to indicate that a keyframe is not being sent(it stays on the last received one for about a minute, and then it starts working again).

This is on a local network, so network wise we are just fine, and these are pretty powerful machines(not anywhere near the limits of CPU power).

Linux Google Chrome 37, windows chrome 42+.

Any ideas? I am thinking about forcing keyframes but I thought the gateway and the plugin already took care of that. Note, I am not specifying a fir_freq when creating the room.

lminiero commented 9 years ago

The fir_freq is exactly the mechanism the plugin employs to force key frames: the gateway instead ignores them all, as from the core perspective media is completely opaque.

Without a fir_freq, no FIR/PLI would ever be sent unless someone new attaches to the publisher or a recorder is started over it (both cases that need a key frame to be available). IIRC FIR/PLI sent by viewers are ignored, in order to avoid situations where a publisher is "drowned" by tons of FIR/PLI in case several viewers are there: this might need to be revisited, possibly by envisaging some kind of threshold (e.g., don't forward FIR/PLI if one was sent too recently).

benwtrent commented 9 years ago

Ugh, I just read through the code again and I see that if I don't set it, there is not a default value, did Janus provide a default frequency in earlier versions?

lminiero commented 9 years ago

It did in the past, you're right, I don't recall when or why we removed that.

ploxiln commented 9 years ago

Even without a keyframe being generated for a very long time, if the video was working initially, it shouldn't stop. Unless it lost some packets (and never got retransmissions).

It appears that when fir_freq was introduced as a configurable parameter, over a year ago, it defaulted to 0 in the code, but the example config set it to 10 (which was the value previously hard-coded in the FIR/PLI routine).

5e9e29e0 (meetecho        2014-03-12 17:44:06 +0100  381)                       videoroom->fir_freq = 0;
5e9e29e0 (meetecho        2014-03-12 17:44:06 +0100  382)                       if(firfreq != NULL && firfreq->value != NULL)
5e9e29e0 (meetecho        2014-03-12 17:44:06 +0100  383)                               videoroom->fir_freq = atol(firfreq->value);
commit 5e9e29e0bcf9c5070e1137ca96195d9846d931a7
Author: meetecho <github@meetecho.com>
Date:   Wed Mar 12 17:44:06 2014 +0100

    Several changes and improvements

    Made the install.sh script smarter in checking dependencies;
    Added a STUN test request at startup, when enabled;
    Added an option to specify the public IP of the machine, and fixed the information put in the c-lines accordingly;
    Added an option to specify an RTP/RTCP port range (note: this depends on the availability of the related function in libnice, which apparently is not on my Fedora 18);
    Made libopus and libogg dependencies optional (meaning the AudioBridge and/or VoiceMail plugins may not be built, if the libraries are not available);
    Attempts to improve the FIR/PLI management in the VideoMCU plugin;
    created utils.c file to contain helpers we may need in the future
lminiero commented 9 years ago

Yep, you're right in saying video should work anyway under normal circumstances. I guess in this case the problem might be either an excessive packet loss (shouldn't be, since it's a local network) or too big packets that may be exceeding the MTU and hence be dropped and never get to viewers. @Computician when you say HiRes video at that framerate, are you talking of something generated by a browser or by something you're piping through the application you were building?

benwtrent commented 9 years ago

I am running 1920x1080 at 15 fps. The CPU utilization on either peer is not tapped(close to 70% capacity on one but that is fine).

It honestly seems to be Chrome wanting a keyframe and not getting it. This freeze only happens periodically when their is a burst of activity. Now, the freeze is only for about a second as I have a fir_freq request being made by the gateway every 15 frames(to match my fps).

I think it would be beneficial for Janus to start responding intelligently to control packets sent from the peers.

ploxiln commented 9 years ago

That seems to me like pushing it. It's possible that the instantaneous bitrate during the keyframe might be higher than the network and software can smoothly handle.

Your original report is consistent with this. Without a FIR triggered by janus, the client generating the video will still generate a keyframe periodically, every couple of minutes. It's more likely to do so for a frame that is unusually different from the previous one (a "ton of movement"). Also, if there's lots of motion, it's likely that the frames after the keyframe are also relatively large. So it seems like, at 1080p, sometimes a keyframe overwhelms something (network queues due to UDP or chrome decoder/timing, I'm not sure), and then things don't recover until the next keyframe.

ploxiln commented 9 years ago

It's also possible that improving janus' NACK-generating algorithm to NACK missed packets more promptly than up to half a second later, could help your situation (so the huge burst of UDP packets are more reliably received by janus from the sender, and available for retransmit etc.)

In fact I've worked on such an algorithm, it's the last in this series of commits that I'm trying to slowly feed upstream ;)

https://github.com/meetecho/janus-gateway/compare/master...ploxiln:perch_patches_v2

benwtrent commented 9 years ago

I will have to take a gander at that patch. As it is right now, what do you guys recommend for trying to get this as smooth as possible. It HAS to be 1080 at at least 15 fps. CPU utilization is just fine and on the network side of things, I am not seeing anything that indicates too much traffic(local network only).

I could increase my fir_freq or just listen for a request for a keyframe and then forward that to the peers(right now only Two, which it will most likely stay that way, with a third party listening).

ploxiln commented 9 years ago

I don't really have any helpful advice... except to suggest doing experiments to confirm causes or at least correlations. If you're using WiFi for any links, try making them all wired ethernet. Try 720p. Maybe look at NACK statistics in the janus admin.

lminiero commented 9 years ago

@ploxiln right sorry, I still haven't found some time to inspect that pull request, I'll try to do that today. @Computician if you happen to test it and see any improvement let me know!

lminiero commented 9 years ago

@Computician did you try again with a recent version? I added a way for the videoroom plugin to forward PLI/FIR requests coming from viewers to the related publishers, which should fix the issue you had.

benwtrent commented 9 years ago

I am testing that sucker out now along with the new websocket connection changes. I will update after today(possibly later today) after some heavy testing is done.

lminiero commented 9 years ago

Any update on this?

benwtrent commented 9 years ago

Yeah, we seem to be ok. The tests that were done seem to indicate that we are good on streaming HD.