Open hateeyan opened 1 year ago
When the recording media_bug is initing, if hang up the call now, the problem will be reproduced.
To Reproduce
switch_ivr_async.c
Make sure we have enough time to hang up the call before the media bug init finishes.--- switch_ivr_async.c 2023-08-24 13:46:50.935017745 +0800
+++ switch_ivr_async.c.patched 2023-08-24 13:55:23.135628464 +0800
@@ -1509,6 +1509,8 @@ static switch_bool_t record_callback(swi
switch_core_media_gen_key_frame(session);
}
}
+ /* Make sure we have enough time to hang up the call before the media bug init finishes */
+ switch_yield(20000000);
}
break;
case SWITCH_ABC_TYPE_TAP_NATIVE_READ:
@hateeyan Thanks for the details. Can you please describe here a little bit more information about this PR and how it changes the behavior.
Hi, @andywolk
After hanging up the call, session_thread
removed all media bugs (line 1728).
Media bug will be appended to session->bugs
after the initialization finished (SWITCH_ABC_TYPE_INIT
).
In this case, the media bug will never be colsed (SWITCH_ABC_TYPE_CLOSE
).
In record_callback
, if the media bug is not closed, the thread condition will not signal.
In recording_thread
, lock the session and wait for the condition.
A deadlock happens which can explain the backtrace.
if we lock the bug_rwlock, it will wait for the initialization to complete.
if we lock the bug_rwlock, it will wait for the initialization to complete.
That's the part I was more interested in. Can you please add a little bit more information about how this changes the behavior.
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
Is it possible that switch_core_media_bug_add
is called after switch_core_media_bug_remove_all_function
?
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
Is it possible that
switch_core_media_bug_add
is called afterswitch_core_media_bug_remove_all_function
?
I just tested this case and it doesn't cause problems.
switch_core_media_bug_remove_all_function
will be called again in switch_core_session_destroy
.
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_session.c#L1558
Before first calling switch_core_media_bug_remove_all_function
, the channel has been set to CS_DESTROY
, new api calls(uuid_record
) will fail.
Found a related issue #1673
It works fine in our production environment for almost 3 months with the patch #2218.
We're almost a year out with this patch. Are there any status updates? I'm seeing this issue in the wild.
Describe the bug Hangup a call when starting recording may cause zombie channels. The coredump shows that the channel is waiting a condition signal in
recording_thread
.To Reproduce Hard to reproduce
Expected behavior Channel cleared
Package version or git hash
Trace logs
backtrace from core file If applicable, provide the full backtrace from the core file.