signalwire / freeswitch

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
https://freeswitch.com/#getting-started
Other
3.51k stars 1.4k forks source link

sofia zombie channels #2389

Open wmasilva opened 7 months ago

wmasilva commented 7 months ago

I'm using mod_hash as backend to limit simultaneous calls per user/account, from time to time i see the counter associated to an user not release and I've to increase the limit or restart freeswitch to reset the counter.

Initial i thought it could be an issue with mod_hash not deleting the entry, but i find out that the problem is the sofia channel not getting destroy, so the resource remains active.

If i manually kill the channel, using uuid_kill, i see the message "success" but the counter remains active and the channel is still present in database. I then try to uuid_dump or uuid_kill and i got the message "no such channel".

I did a gcore dump on freeswitch process and find out that the channel was lock in the function "sofia_receive_message".

Attach the BT for the thread associated to the zombie channel:


(gdb) bt
#0  __lll_lock_wait (futex=futex@entry=0x55ce38868bf8, private=0) at lowlevellock.c:52
#1  0x00007f40a3f688d1 in __GI___pthread_mutex_lock (mutex=0x55ce38868bf8) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007f40a433b409 in fspr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
#3  0x00007f40a3ff1855 in switch_mutex_lock (lock=<optimized out>) at src/switch_apr.c:310
#4  0x00007f40a0fd689c in sofia_receive_message (session=0x55ce4703e028, msg=0x7f409637efa0) at mod_sofia.c:1531
#5  0x00007f40a40246b6 in switch_core_session_perform_receive_message (session=0x55ce4703e028, message=<optimized out>, file=0x7f40a4360ec3 "src/switch_core_media.c", func=<optimized out>, line=<optimized out>)
    at src/switch_core_session.c:930
#6  0x00007f40a406964e in switch_core_session_write_frame (session=0x55ce4703e028, frame=0x55ce38f24928, flags=0, stream_id=0) at src/switch_core_media.c:15950
#7  0x00007f40a40c0e0e in audio_bridge_thread (obj=0x55ce3feff1a0, thread=0x0) at src/switch_ivr_bridge.c:818
#8  0x00007f40a40c32b6 in switch_ivr_multi_threaded_bridge (session=session@entry=0x55ce4e9cc028, peer_session=0x55ce4703e028, input_callback=<optimized out>, session_data=session_data@entry=0x0, 
    peer_session_data=peer_session_data@entry=0x0) at src/switch_ivr_bridge.c:1789
#9  0x00007f409f4561a0 in audio_bridge_function (session=<optimized out>, data=<optimized out>) at mod_dptools.c:3673
#10 0x00007f40a4028855 in switch_core_session_exec (session=0x55ce4e9cc028, application_interface=0x55ce403850f8, 
    arg=0x55ce3ee8e028 "[leg_timeout=12,bridge_answer_timeout=12]user/1001@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1002@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1003@de"...)
    at src/switch_core_session.c:2962
#11 0x00007f40a4029090 in switch_core_session_execute_application_get_flags (session=0x55ce4e9cc028, app=0x55ce396cdf90 "bridge", arg=0x0, flags=<optimized out>) at src/switch_core_session.c:2825
#12 0x00007f40a402d4f2 in switch_core_standard_on_execute (session=0x55ce4e9cc028) at src/switch_core_state_machine.c:350
#13 switch_core_session_run (session=0x55ce4e9cc028) at src/switch_core_state_machine.c:647
#14 0x00007f40a4026240 in switch_core_session_thread (thread=0x55ce38868bf8, obj=0x80) at src/switch_core_session.c:1725
#15 0x00007f40a402184e in switch_core_session_thread_pool_worker (thread=0x55ce52a86280, obj=<optimized out>) at src/switch_core_session.c:1790
#16 0x00007f40a3f65ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007f40a3c81a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) bt full
#0  __lll_lock_wait (futex=futex@entry=0x55ce38868bf8, private=0) at lowlevellock.c:52
        __ret = <optimized out>
#1  0x00007f40a3f688d1 in __GI___pthread_mutex_lock (mutex=0x55ce38868bf8) at ../nptl/pthread_mutex_lock.c:115
        __futex = 0x55ce38868bf8
        id = <optimized out>
        type = <optimized out>
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        id = <optimized out>
#2  0x00007f40a433b409 in fspr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
        rv = <optimized out>
#3  0x00007f40a3ff1855 in switch_mutex_lock (lock=<optimized out>) at src/switch_apr.c:310
No locals.
#4  0x00007f40a0fd689c in sofia_receive_message (session=0x55ce4703e028, msg=0x7f409637efa0) at mod_sofia.c:1531
        channel = 0x55ce495f2110
        tech_pvt = 0x55ce38868848
        status = SWITCH_STATUS_SUCCESS
        sip_reply_contact = <optimized out>
        __PRETTY_FUNCTION__ = "sofia_receive_message"
        __func__ = "sofia_receive_message"
#5  0x00007f40a40246b6 in switch_core_session_perform_receive_message (session=0x55ce4703e028, message=<optimized out>, file=0x7f40a4360ec3 "src/switch_core_media.c", func=<optimized out>, line=<optimized out>) at src/switch_core_session.c:930
        ptr = <optimized out>
        status = <optimized out>
        __PRETTY_FUNCTION__ = "switch_core_session_perform_receive_message"
        __func__ = "switch_core_session_perform_receive_message"
#6  0x00007f40a406964e in switch_core_session_write_frame (session=0x55ce4703e028, frame=0x55ce38f24928, flags=0, stream_id=0) at src/switch_core_media.c:15950
        msg = {from = 0x0, message_id = SWITCH_MESSAGE_INDICATE_TRANSCODING_NECESSARY, numeric_arg = 0, string_arg = 0x0, string_arg_size = 0, pointer_arg = 0x0, pointer_arg_size = 0, numeric_reply = 0, string_reply = 0x0, string_reply_size = 0, pointer_reply = 0x0, pointer_reply_size = 0, flags = 0, _file = 0x7f40a4360ec3 "src/switch_core_media.c", _func = 0x7f40a4365ba0 <__func__.0> "switch_core_session_write_frame", _line = 15951, string_array_arg = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, delivery_time = 0}
        status = SWITCH_STATUS_FALSE
        enc_frame = 0x0
        write_frame = 0x55ce38f24928
        flag = 0
        need_codec = <optimized out>
        perfect = 0
        do_bugs = <optimized out>
        do_write = 0
        do_resample = <optimized out>
        ptime_mismatch = <optimized out>
        pass_cng = <optimized out>
        resample = 0
        did_write_resample = 0
        __PRETTY_FUNCTION__ = "switch_core_session_write_frame"
        __func__ = "switch_core_session_write_frame"
        done = <optimized out>
#7  0x00007f40a40c0e0e in audio_bridge_thread (obj=0x55ce3feff1a0, thread=0x0) at src/switch_ivr_bridge.c:818
        status = SWITCH_STATUS_SUCCESS
        event = 0x0
        data = 0x55ce3feff1a0
        stream_id = 0
        pre_b = <optimized out>
        ans_a = <optimized out>
        ans_b = <optimized out>
        originator = 1
        input_callback = 0x0
        msg = {from = 0x0, message_id = SWITCH_MESSAGE_REDIRECT_AUDIO, numeric_arg = 0, string_arg = 0x0, string_arg_size = 0, pointer_arg = 0x0, pointer_arg_size = 0, numeric_reply = 0, string_reply = 0x0, string_reply_size = 0, pointer_reply = 0x0, pointer_reply_size = 0, flags = 0, _file = 0x0, _func = 0x0, _line = 0, string_array_arg = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, delivery_time = 0}
        user_data = 0x0
        chan_a = 0x55ce544e6110
        chan_b = 0x55ce495f2110
        read_frame = 0x55ce38f24928
        session_a = 0x55ce4e9cc028
        session_b = 0x55ce4703e028
        read_frame_count = <optimized out>
        app_name = 0x0
        app_arg = 0x0
        inner_bridge = 0
        exec_check = 0
        silence_codec = {codec_interface = 0x0, implementation = 0x0, fmtp_in = 0x0, fmtp_out = 0x0, flags = 0, memory_pool = 0x0, private_info = 0x0, agreed_pt = 0 '\000', mutex = 0x0, next = 0x0, session = 0x0, cur_frame = 0x0}
        silence_frame = {codec = 0x0, source = 0x0, packet = 0x0, packetlen = 0, extra_data = 0x0, data = 0x0, datalen = 0, buflen = 0, samples = 0, rate = 0, channels = 0, payload = 0 '\000', timestamp = 0, seq = 0, ssrc = 0, m = SWITCH_FALSE, flags = 0, user_data = 0x0, pmap = 0x0, img = 0x0, geometry = {w = 0, h = 0, x = 0, y = 0, z = 0, M = 0, X = 0}}
        silence_data = {0 <repeats 4096 times>}
        silence_var = <optimized out>
        silence_val = 0
        bypass_media_after_bridge = 0
        bridge_answer_timeout = <optimized out>
        bridge_filter_dtmf = 0
        answer_timeout = <optimized out>
        sent_update = 1
        answer_limit = 0
        exec_app = 0x55ce3feff438 "lua"
        exec_data = 0x55ce3feff440 "bridge_pre_execute_aleg.lua"
        read_impl = {codec_type = SWITCH_CODEC_TYPE_AUDIO, ianacode = 8 '\b', iananame = 0x55ce39a14318 "PCMA", fmtp = 0x0, samples_per_second = 8000, actual_samples_per_second = 8000, bits_per_second = 64000, microseconds_per_packet = 20000, samples_per_packet = 160, decoded_bytes_per_packet = 320, encoded_bytes_per_packet = 160, number_of_channels = 1 '\001', codec_frames_per_packet = 160, init = 0x7f40a4112130 <switch_g711u_init>, encode = 0x7f40a4111b40 <switch_g711u_encode+128>, decode = 0x7f40a4111c00 <core_pcm_shutdown>, encode_video = 0x0, decode_video = 0x0, codec_control = 0x0, destroy = 0x7f40a4111cb0 <switch_proxy_decode>, codec_id = 144, impl_id = 155, modname = 0x55ce39a13bd8 "CORE_PCM_MODULE", next = 0x55ce39a141f0}
        txt_launch = 0
        th = {session_a = 0x0, session_b = 0x0, up = 0}
        banner_file = 0x0
        played_banner = 0
        banner_counter = 0
        pass_val = 1
        last_pass_val = 1
        vh = {session_a = 0x0, session_b = 0x0, up = 0}
        vid_launch = 0
        __func__ = "audio_bridge_thread"
#8  0x00007f40a40c32b6 in switch_ivr_multi_threaded_bridge (session=session@entry=0x55ce4e9cc028, peer_session=0x55ce4703e028, input_callback=<optimized out>, session_data=session_data@entry=0x0, peer_session_data=peer_session_data@entry=0x0) at src/switch_ivr_bridge.c:1789
        app = <optimized out>
        data = <optimized out>
        a_leg = 0x55ce3feff1a0
        __func__ = "switch_ivr_multi_threaded_bridge"
        b_leg = 0x55ce5a8c0a28
        caller_channel = 0x55ce544e6110
        peer_channel = 0x55ce495f2110
        stream_id = 0
        status = SWITCH_STATUS_SUCCESS
        state = <optimized out>
        event = 0x0
        br = 1
        inner_bridge = 0
        var = <optimized out>
        cause = <optimized out>
        msg = {from = 0x7f40a4375384 "src/switch_ivr_bridge.c", message_id = SWITCH_MESSAGE_INDICATE_BRIDGE, numeric_arg = 0, string_arg = 0x55ce3feff3c0 "b4eb530f-87cc-4fd0-b291-9f46517962fc", string_arg_size = 0, pointer_arg = 0x0, pointer_arg_size = 0, numeric_reply = 0, string_reply = 0x0, string_reply_size = 0, pointer_reply = 0x0, pointer_reply_size = 0, flags = 0, _file = 0x0, _func = 0x0, _line = 0, string_array_arg = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, delivery_time = 0}
#9  0x00007f409f4561a0 in audio_bridge_function (session=<optimized out>, data=<optimized out>) at mod_dptools.c:3673
        a_key = 0x0
        b_key = 0x0
        ok = <optimized out>
        func = <optimized out>
        peer_channel = 0x55ce495f2110
        caller_channel = <optimized out>
        peer_session = 0x55ce4703e028
        v_campon = <optimized out>
        v_campon_retries = <optimized out>
        v_campon_sleep = <optimized out>
        v_campon_timeout = <optimized out>
        v_campon_fallback_exten = <optimized out>
        cause = SWITCH_CAUSE_SUCCESS
        campon_retries = <optimized out>
        campon_timeout = <optimized out>
        campon_sleep = <optimized out>
        tmp = <optimized out>
        camping = <optimized out>
        fail = <optimized out>
        thread_started = <optimized out>
        stake = {session = 0x0, running = 0, do_xfer = 0, moh = 0x0}
        moh = <optimized out>
        thread = 0x0
        thd_attr = 0x0
        camp_data = <optimized out>
        status = SWITCH_STATUS_SUCCESS
        camp_loops = <optimized out>
        __func__ = "audio_bridge_function"
#10 0x00007f40a4028855 in switch_core_session_exec (session=0x55ce4e9cc028, application_interface=0x55ce403850f8, arg=0x55ce3ee8e028 "[leg_timeout=12,bridge_answer_timeout=12]user/1001@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1002@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1003@de"...) at src/switch_core_session.c:2962
        log = <optimized out>
        lp = <optimized out>
        event = 0x0
        var = <optimized out>
        channel = <optimized out>
        expanded = 0x55ce3ee8e028 "[leg_timeout=12,bridge_answer_timeout=12]user/1001@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1002@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1003@de"...
        app = 0x7f409f457a62 "bridge"
        app_uuid_var = <optimized out>
        app_uuid_name = 0x0
        msg = {from = 0x7f40a435b231 "src/switch_core_session.c", message_id = SWITCH_MESSAGE_INDICATE_APPLICATION_EXEC, numeric_arg = 0, string_arg = 0x0, string_arg_size = 0, pointer_arg = 0x0, pointer_arg_size = 0, numeric_reply = 0, string_reply = 0x0, string_reply_size = 0, pointer_reply = 0x0, pointer_reply_size = 0, flags = 0, _file = 0x0, _func = 0x0, _line = 0, string_array_arg = {0x7f409f457a62 "bridge", 0x55ce3ee8e028 "[leg_timeout=12,bridge_answer_timeout=12]user/1001@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1002@default.es|[leg_timeout=12,bridge_answer_timeout=12]user/1003@de"..., 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, delivery_time = 0}
        delim = <optimized out>
        scope = 0
        uuid_str = "676f240e-790d-4064-b32e-73e0c65c110b\000U\000\000\000\000\000\000\000\000\000\000\340\333\065\244@\177\000\000a\026\020\244@\177\000\000\000\000\000\000\000\000\000\000\067\270\065\244@\177\000\000\250\030\070\226@\177\000\000\314T\005\244@\177\000\000\000\000\000\000\000\000\000\000\060\000\000\000\060\000\000\000\210\031\070\226@\177\000\000\300\030\070\226@\177\000\000\000>\253>\316U\000\000\062\n5\244@\177\000\000\340\374}C\316U\000\000\270\025\377\243@\177\000\000( *Q\316U\000\000Ur\n\244@\177\000\000( *Q\316U\000\000( *Q\316U\000\000( *Q\316U\000\000"...
        app_uuid = 0x7f4096381840 "676f240e-790d-4064-b32e-73e0c65c110b"
        expand_variables = <optimized out>
        __PRETTY_FUNCTION__ = "switch_core_session_exec"
        __func__ = "switch_core_session_exec"
#11 0x00007f40a4029090 in switch_core_session_execute_application_get_flags (session=0x55ce4e9cc028, app=0x55ce396cdf90 "bridge", arg=0x0, flags=<optimized out>) at src/switch_core_session.c:2825
        application_interface = 0x55ce403850f8
        status = SWITCH_STATUS_SUCCESS
        __func__ = "switch_core_session_execute_application_get_flags"
        done = <optimized out>
#12 0x00007f40a402d4f2 in switch_core_standard_on_execute (session=0x55ce4e9cc028) at src/switch_core_state_machine.c:350
        current_application = 0xfffffffffffffe00
        extension = 0x55ce4e269af8
        uuid = <optimized out>
        top = <optimized out>
        extension = <optimized out>
        uuid = <optimized out>
        __func__ = "switch_core_standard_on_execute"
        top = <optimized out>
        current_application = <optimized out>
        other_session = <optimized out>
        msg = {from = <optimized out>, message_id = <optimized out>, numeric_arg = <optimized out>, string_arg = <optimized out>, string_arg_size = <optimized out>, pointer_arg = <optimized out>, pointer_arg_size = <optimized out>, numeric_reply = <optimized out>, string_reply = <optimized out>, string_reply_size = <optimized out>, pointer_reply = <optimized out>, pointer_reply_size = <optimized out>, flags = <optimized out>, _file = <optimized out>, _func = <optimized out>, _line = <optimized out>, string_array_arg = {<optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>}, delivery_time = <optimized out>}
#13 switch_core_session_run (session=0x55ce4e9cc028) at src/switch_core_state_machine.c:647
        global_proceed = 1
        index = <optimized out>
        proceed = <optimized out>
        do_extra_handlers = 1
        ptr = <optimized out>
        rstatus = <optimized out>
        state = <optimized out>
        midstate = CS_EXECUTE
        endstate = <optimized out>
        endpoint_interface = <optimized out>
        driver_state_handler = <optimized out>
        application_state_handler = <optimized out>
        silly = 0
        new_loops = 500
        __PRETTY_FUNCTION__ = "switch_core_session_run"
        __func__ = "switch_core_session_run"
#14 0x00007f40a4026240 in switch_core_session_thread (thread=0x55ce38868bf8, obj=0x80) at src/switch_core_session.c:1725
        session = 0x55ce4e9cc028
        event = 0x0
        event_str = 0x0
        val = <optimized out>
        __func__ = "switch_core_session_thread"
        __PRETTY_FUNCTION__ = "switch_core_session_thread"
#15 0x00007f40a402184e in switch_core_session_thread_pool_worker (thread=0x55ce52a86280, obj=<optimized out>) at src/switch_core_session.c:1790
        td = 0x55ce38f37d90
        pop = 0x55ce38f37d90
        check_status = <optimized out>
        node = <optimized out>
        pool = 0x55ce52a86028
        __func__ = "switch_core_session_thread_pool_worker"
#16 0x00007f40a3f65ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139915374896896, 22784233506454596, 139915553748734, 139915553748735, 139915374894208, 94344293822416, -85042465733493692, -85087015196181436}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#17 0x00007f40a3c81a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
wmasilva commented 2 months ago

after applying the pr #2390 the issue is better but it still have limit hash entries that are not deleted. To detect invalid values, i use "hash_dump limit" and "show channels" and the result is:

ERROR found invalid account limit: 200@lab.local limit:1 channels:0

I've implemented a method to delete limit entries without having to restart the entire FS, #2518 the system is recording calls so i apply the pr #2218.

So far is running ok and i'm collecting more debug to try to fetch the source of the problem.

wmasilva commented 2 months ago

this issue could be related to #2290 .