Closed francoisTemasys closed 10 years ago
Yes, that's consistent with the segfaults we experienced in similar tests. Specifically, a few days ago we managed to test an MCU room with about 20 participants, and when most of them started leaving the same crash when relaying RTCP occurred. Apparently there's still something to improve in terms of keeping all the involved components aware of the fact that a session is not available anymore: either that, or there's a leak that is corrupting some heap memory.
Thanks for testing!
Is it normal that in the videoroom plugin the function void janus_videoroom_hangup_media(janus_plugin_session *handle) [543]
is not protected by any mutex and especially: g_hash_table_remove(participant->room->participants, GUINT_TO_POINTER(participant->user_id)); [566]
. Why are the videorooms not protected by a mutex?
While on the audiobridge plugin that's one of the first thing done.
To be more precise on the line producing the segfault:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff727fc700 (LWP 22912)]
0x00000000004189cf in janus_ice_relay_rtcp (handle=0x7fff7c001960, video=1, buf=0x7fff727ebbf0 "\201", <incomplete sequence \311>, len=56)
at /home/game/GitRepo/janus-gateway/ice.c:952
952 janus_ice_component *component = handle->rtcpmux ? stream->rtp_component : stream->rtcp_component;
I may have found the fix. When we receive a rtcp packet, if we are subscriber we can at least check if the publisher is sending something otherwise he may be leaving, else if we are publisher we don't need to relay if we don't send audio or video (and we are probably going to leave). janus_videoroom.c
@@ -515,26 +519,30 @@ void janus_videoroom_incoming_rtcp(janus_plugin_session *handle, int video, char
if(l && l->feed) {
janus_videoroom_participant *p = l->feed;
if(p && p->session) {
- if(p->bitrate > 0)
- janus_rtcp_cap_remb(buf, len, p->bitrate);
- gateway->relay_rtcp(p->session->handle, video, buf, len);
+ if((!video && p->audio_active) || (video && p->video_active)) {
+ if(p->bitrate > 0)
+ janus_rtcp_cap_remb(buf, len, p->bitrate);
+ gateway->relay_rtcp(p->session->handle, video, buf, len);
+ }
}
}
} else if(session->participant_type == janus_videoroom_p_type_publisher) {
/* FIXME Badly: we're just bouncing the incoming RTCP back with modified REMB, we need to improve this... */
janus_videoroom_participant *participant = (janus_videoroom_participant *)session->participant;
- if(participant->bitrate > 0)
- janus_rtcp_cap_remb(buf, len, participant->bitrate);
- gateway->relay_rtcp(handle, video, buf, len);
- /* FIXME Badly: we're also blinding forwarding the publisher RTCP to all the listeners: this probably means confusing them... */
- if(participant->listeners != NULL) {
- GSList *ps = participant->listeners;
- while(ps) {
- janus_videoroom_listener *l = (janus_videoroom_listener *)ps->data;
- if(l->session && l->session->handle) {
- gateway->relay_rtcp(l->session->handle, video, buf, len);
+ if((!video && participant->audio_active) || (video && participant->video_active)) {
+ if(participant->bitrate > 0)
+ janus_rtcp_cap_remb(buf, len, participant->bitrate);
+ gateway->relay_rtcp(handle, video, buf, len);
+ /* FIXME Badly: we're also blinding forwarding the publisher RTCP to all the listeners: this probably means confusing them... */
+ if(participant->listeners != NULL) {
+ GSList *ps = participant->listeners;
+ while(ps) {
+ janus_videoroom_listener *l = (janus_videoroom_listener *)ps->data;
+ if(l->session && l->session->handle) {
+ gateway->relay_rtcp(l->session->handle, video, buf, len);
+ }
+ ps = ps->next;
}
- ps = ps->next;
}
}
}
Francois,
thanks for the patch! I thought we already checked for video_active in the code, but I guess this was only done for RTP and not RTCP as of yet (which might indeed explain why all the dumps I have are triggered by relay_rtcp). I'll try and integrate your changes in the code to see if this fixes it for our stress tests. Did it already fix the crashes in your case?
That said, I guess I'll also have to find a core fix for that as well, as otherwise an ill-beheaving plugin will still be able to crash the gateway. That's what I've tried doing so far, and I hope I'll get closer to a definitive solution soon enough.
Yes it fixed my crashes. I agree with the need to fix that in the janus core.
Solved in commit 35f2308ae10bb417e6e3de8a19d21f1766a3a086
I modified a little bit the code of the sharescreen example to have a webrtc webinar, no HTTPS required and it's showing the camera+microphone.
Here is how to reproduce:
Here is the output:
This seg fault doesn't appear all the time actually. Another race problem I guess.