Closed Jonbeckas closed 5 months ago
If you can provide replication steps (including maybe a sipp script), I'll try to have a look at what the issue might be. As I wrote in reply to the PR, I don't think the patch is a proper fix, since it would introduce different problems, and so I'd like to investigate a different solution.
The replication steps are:
I am not really familliar with sipp, but i try to add a sipp script later.
If these invites are offerless, then I don't think the core or alert states have anything to do with it: there wouldn't be any SDP to trigger a new PeerConnection establishment. It's much more likely an inconsistent state within the SIP plugin itself. I'll try to replicate and let you know.
I think I have a better understanding now, and why you were trying to tinker with the alert flag. It's true, as I said, that there's no PeerConnection establishment involved, but that's apparently the very root of the issue, rather than the reason why it shouldn't happen.
Basically, an offerless INVITE means no SDP and so, again, no PC: at the same time, though, when you decline the call, we invoke the close_pc()
function in the core from the plugin, to clean up any WebRTC resource that may have been allocated; this results in the alert
flag being set to true
, and the hangup_media()
callback being called on the plugin, which resets the plugin flags (establishing
, established
). So the first time it happens, it works fine: the problem, though, is that there's no actual WebRTC cleanup happening (we never initialized a PC) and so alert
stays true
. At the second offerless INVITE, the same thing happens, but this time the call to close_pc()
finds alert
already true
, which means hangup_media()
is not called again on the plugin (we do that to avoid duplicates from the same event). As a result, the plugin establishing
flag remains set, and further calls are automatically rejected, due to a broken stats in the plugin itself.
I'm wondering now what the right approach would be to address this. The "easy" fix would be to handle this directly in the SIP plugin, but in practice other plugins could in some cases end up in the same situation (even though it also depends on how they handle signaling, and the same two consecutive close_pc
to two consecutive "no PC" should be happening, so much less likely). I'm still not convinced your PR addresses it properly, since it could break some core states. I'll think about it some more and let you know when I come up with a potential fix.
@Jonbeckas can you try this diff?
diff --git a/src/ice.c b/src/ice.c
index da8ffd10..dc5ef226 100644
--- a/src/ice.c
+++ b/src/ice.c
@@ -1685,6 +1685,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
return;
janus_mutex_lock(&handle->mutex);
if(!handle->agent_created) {
+ janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_NEW_DATACHAN_SDP);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_READY);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_CLEANING);
This immediately resets the alert
back to false
if when we get to the point of freeing the resources (independently of how we got there) there's actually nothing to cleanup. In my local SIPp tests I can't replicate the issue anymore, but it would be better to test this in other SIP scenarios too. As soon as you can confirm it doesn't break anything for you, I'll push the fix upstream.
The patch seems to change the bug a bit for me, The scenario I described above does work now, but after I accept the call and later hang up, the next offerless call, that is denied will leave the session in an establishing=1 state again.
Mh, thinking about it, that was to be expected, and would have happened even before this patch. When a regular call is closed, the same will happen (close_pc
→ hangup_media
) but in this case alert
will remain true
: normally it's unset only when a new call starts, in fact. This means that after that successful call, a new offerless invite being declined will find alert
set to true
and not trigger the hangup_media
call, thus causing the same problem as before.
In theory, the most obvious fix would be to ensure we reset the reset
flag when we've cleaned up resources, but I'm wondering if that may cause issues in some cases. As I mentioned, we use that flag to also prevent multiple hangup_media
occurrences (e.g., different things cause a PC to close), and having it reset right away instead of right before the next call may cause that to break. It may even cause a loop, if the pluginis wrongly wired (e.g., close_pc
and hangup_media
triggering each other). I'll think about this some more.
While I think of the implications, you can give the following patch a try, which always resets the alert
flag when cleaning WebRTC resources:
diff --git a/src/ice.c b/src/ice.c
index da8ffd10..96b149d1 100644
--- a/src/ice.c
+++ b/src/ice.c
@@ -1685,6 +1685,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
return;
janus_mutex_lock(&handle->mutex);
if(!handle->agent_created) {
+ janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_NEW_DATACHAN_SDP);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_READY);
janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_CLEANING);
@@ -1755,6 +1756,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
janus_ice_notify_hangup(handle, handle->hangup_reason);
}
handle->hangup_reason = NULL;
+ janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
janus_mutex_unlock(&handle->mutex);
JANUS_LOG(LOG_INFO, "[%"SCNu64"] WebRTC resources freed; %p %p\n", handle->handle_id, handle, handle->session);
}
Please let me know if you notice any regression.
The patch works like a charm for me.
FYI, after careful consideration I've decided this will not be the patch I'll commit, due to the considerations I've made before. I'll instead ensure that alert
is set to true
as a default, since the anomaly was that a hangup_media
was following a close_pc
the very first time you sent an offerless INVITE, and that's wrong. This means I'll work on a fix in the SIP plugin itself.
I'll let you know when a patch is ready. I'll probably prepare a PR, so that more people can test the effect on other plugins as well.
@Jonbeckas please test the PR above, which attempts the fix in a different way. It should address both scenarios you had problems with. You may want to test more, though, just to ensure nothing else breaks. Notice I also fixed the error code we send back by default when declining: for some reason it was 486
instead of 603
.
The PR works for me.
What version of Janus is this happening on? 1.2.2; b98e3bb91bd728ce21f6fd56519a303f2775f755
Have you tested a more recent version of Janus too? Yes, on the master branch
Was this working before? Not sure, behaviour of the telephone system we use changed (3cx)
Additional context If a session has been hangup, the
JANUS_ICE_WEBRTC_ALERT
flag will be set injanus_ice_webrtc_hangup
and removed in the next call injanus_ice_setup_local
which is called byjanus_plugin_handle_sdp
. For a denied incoming sip call with no sdp body, theJANUS_ICE_WEBRTC_ALERT
will not be removed and during the following hangup thejanus_ice_webrtc_hangup
will be abortet before the plugin is notified and the establishing attribute will be set to 0, so the session will get stuck and denies all incoming calls.