Closed adnanel closed 1 month ago
- We send "hangup" request, but don't close the PC on client side
Yours is a good analysis, but why aren't you closing the PC on the client side too when sending the "hangup"? Our SIP demo does, and I would have expected everyone to do the same.
That's an option, I guess. If you think this behaviour is fine we can close the peer connections client side sooner than before.
Currently we do cleanup once we receive the hangup event back from janus (the "janus" : "hangup",
not the SIP plugin hangup). I agree that there is no need for this to be done sequentially, but I'm not a big fan of relying on client side for this flow to finish as expected.
Makes sense. Starting from the assumption that I'm not going to revert the PR/commit you mentioned (which had a much serious impact on the status of sessions), I think the main issue here is related to timing and the order of things happening, that in this specific case lead to an internal cleanup in the plugin (janus_sip_hangup_media_internal
) but not to a cleanup in the core (close_pc
) that would be needed in this case though.
Rather than overcomplicate things, maybe there's an easier fix: in your last bullet point, always call both close_pc
and janus_sip_hangup_media_internal
. In fact, in cases where a PC was available, close_pc
will schedule a call to hangup_media
on the plugin, which in turn will call janus_sip_hangup_media_internal
: if there was no PC, it won't, and so we have to do it ourselves (main reason why we made that patch you mentioned). Considering that janus_sip_hangup_media
only calls janus_sip_hangup_media_internal
protected by a mutex, calling it twice shouldn't be an issue: the first call (whether it's our own internal call, or the one scheduled by close_pc
) will clean up things internally, and the second will do nothing since the state will have been changed by the call before.
Can you try changing this block here:
if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
if(session->media.has_audio || session->media.has_video) {
/* Get rid of the PeerConnection in the core */
gateway->close_pc(session->handle);
} else {
/* No SDP was exchanged, just clean up locally */
janus_sip_hangup_media_internal(session->handle);
}
}
to something like this instead
if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
/* Get rid of the PeerConnection in the core */
gateway->close_pc(session->handle);
/* Also clean up locally, in case there was no PC */
janus_sip_hangup_media_internal(session->handle);
}
and let me know how that works for you? If I'm right, it should address your issue and at the same time not introduce any regression (due to the idempotent nature of janus_sip_hangup_media_internal
), but it's a good idea to check if I'm missing anything on the top of my head.
That seems to have fixed our problem, I did a few tests myself and kept a single janus instance running automated tests for the past ~8 hours and no other problems were observed.
Do you want to commit that to master directly or should I create a PR?
No need, I'll push the commit myself to both master and 0.x shortly. Thanks for the feedback!
What version of Janus is this happening on? Newest master, e.g. 504daf5aef333d6f37e41c30b00be24cfb6c83bf
Have you tested a more recent version of Janus too? Yes, master branch is affected.
Was this working before? Yes, this was broken with the change in this commit: https://github.com/meetecho/janus-gateway/commit/0f32c3290fe93acddf3b34b1881613460641368b
Additional context Given a session with janus SIP plugin:
Result: PC remains open, after a while we receive DTLS alert which causes PC closure.