meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.25k stars 2.48k forks source link

[1.x] PC not closing server side on normal hangup #3430

Closed adnanel closed 1 month ago

adnanel commented 1 month ago

What version of Janus is this happening on? Newest master, e.g. 504daf5aef333d6f37e41c30b00be24cfb6c83bf

Have you tested a more recent version of Janus too? Yes, master branch is affected.

Was this working before? Yes, this was broken with the change in this commit: https://github.com/meetecho/janus-gateway/commit/0f32c3290fe93acddf3b34b1881613460641368b

Additional context Given a session with janus SIP plugin:

Result: PC remains open, after a while we receive DTLS alert which causes PC closure.

lminiero commented 1 month ago
  • We send "hangup" request, but don't close the PC on client side

Yours is a good analysis, but why aren't you closing the PC on the client side too when sending the "hangup"? Our SIP demo does, and I would have expected everyone to do the same.

adnanel commented 1 month ago

That's an option, I guess. If you think this behaviour is fine we can close the peer connections client side sooner than before.

Currently we do cleanup once we receive the hangup event back from janus (the "janus" : "hangup", not the SIP plugin hangup). I agree that there is no need for this to be done sequentially, but I'm not a big fan of relying on client side for this flow to finish as expected.

lminiero commented 1 month ago

Makes sense. Starting from the assumption that I'm not going to revert the PR/commit you mentioned (which had a much serious impact on the status of sessions), I think the main issue here is related to timing and the order of things happening, that in this specific case lead to an internal cleanup in the plugin (janus_sip_hangup_media_internal) but not to a cleanup in the core (close_pc) that would be needed in this case though.

Rather than overcomplicate things, maybe there's an easier fix: in your last bullet point, always call both close_pc and janus_sip_hangup_media_internal. In fact, in cases where a PC was available, close_pc will schedule a call to hangup_media on the plugin, which in turn will call janus_sip_hangup_media_internal: if there was no PC, it won't, and so we have to do it ourselves (main reason why we made that patch you mentioned). Considering that janus_sip_hangup_media only calls janus_sip_hangup_media_internal protected by a mutex, calling it twice shouldn't be an issue: the first call (whether it's our own internal call, or the one scheduled by close_pc) will clean up things internally, and the second will do nothing since the state will have been changed by the call before.

Can you try changing this block here:

if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
    if(session->media.has_audio || session->media.has_video) {
        /* Get rid of the PeerConnection in the core */
        gateway->close_pc(session->handle);
    } else {
        /* No SDP was exchanged, just clean up locally */
        janus_sip_hangup_media_internal(session->handle);
    }
}

to something like this instead

if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
    /* Get rid of the PeerConnection in the core */
    gateway->close_pc(session->handle);
    /* Also clean up locally, in case there was no PC */
    janus_sip_hangup_media_internal(session->handle);
}

and let me know how that works for you? If I'm right, it should address your issue and at the same time not introduce any regression (due to the idempotent nature of janus_sip_hangup_media_internal), but it's a good idea to check if I'm missing anything on the top of my head.

adnanel commented 1 month ago

That seems to have fixed our problem, I did a few tests myself and kept a single janus instance running automated tests for the past ~8 hours and no other problems were observed.

Do you want to commit that to master directly or should I create a PR?

lminiero commented 1 month ago

No need, I'll push the commit myself to both master and 0.x shortly. Thanks for the feedback!