Closed adnanel closed 1 year ago
Thanks for the patch, but I think you're doing those things in the wrong place. You've added some logic to the destroy_session
callack, but it's actually the hangup_media_internal
one where those things should be done, since once the PeerConnection is gone (whether the session exists or not) then all relationships to room resources are gone. As a matter of fact, we have code for that already:
https://github.com/meetecho/janus-gateway/blob/master/src/plugins/janus_textroom.c#L3009-L3043
which is supposed to do exactly that, that is leaving rooms you're in and destroying the participants resources. If you think that's not working as expected for some reason, please check what needs to be fixed there instead.
I just tried to replicate your steps, and it seems to work just fine in my case. Obviously the user only leaves once the session timeout kicks in, so ~60s later by default, but the same thing would happen with your patch too, since you're intercepting destroy_session
which is triggered by a handle detach, so only once the session timeout gets rid of all related handles.
If what you're saying is correct (I'm already finishing for the day so can't confirm), the bug I'm trying to fix isn't the same issue I found on the demo, so I should clarify our original problem, then:
We are seeing participants in text rooms well over 60 seconds (instead, they are permanently stuck). The handle and session are both destroyed, but the other participants still see them and they are no longer able to join.
It only works if they properly perform "leave" before. Also the version from this branch seems to also fix our problem.
I will try to analyze it further and see what actually happens.
Then yes, the problem may be in the code I linked above, which is where the logic for "let's remove the user" is, so that's where I'd dig: I couldn't replicate the problem, as the session timeout cleaned the user correctly for me. It may have something to do with the "fake" message we inject to simulate a "leave" message from the user: my guess is that you do end up calling that code, and the user is removed, but the event is not propagated as it should because of something that goes wrong with that "fake" message.
Hi @lminiero
I had time to work on this again, turned out your leave command failed for us due to being an invalid JSON, which in turn is invalid due to our (string) room ID being too long (it's an UUID with a suffix).
I changed the PR to a much simpler version which only expands the request string buffer to 200 characters. Will test today and confirm whether this fixes everything.
Edit: Yep, seems to work now...
@adnanel apologies for the very late response, I've been abroad for the IETF meeting and I'm catching up with all the pending activities. So it was just a matter of the buffer for the "fake" message being too short? If that's the case, it's indeed a very easy fix and I'll merge!
Currently, on abrupt disconnections, participants get stuck in text rooms and aren't able to re-join immediately.
This makes auto reconnect mechanisms needlessly complicated.
In this PR I added automatic leaving of all rooms when the session is being destroyed.
The bug can be reproduced in the official janus textroom demo: