meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.25k stars 2.48k forks source link

Crashing after nua_application_event #3133

Closed carcharothellobo closed 1 year ago

carcharothellobo commented 1 year ago

Hi guys, I have some weird behavior. Im running Sofia SIP in my Janus WebRTC Server, and randomly I have some crashings after SUBSCRIBE event. I had also reported this to Sofia SIP and they told me that this is a Janus issue

This is what a look in my logs, take a look the line with event r_subscribe 900 Internal error at nua_client.c:713:

nta.c:2707 nta_tpn_by_url() nta: selecting scheme sip
nua_stack.c:389 nua_application_event() nua: nua_application_event: entering
tport.c:3293 tport_tsend() tport_tsend(0x616000272180) tpn = */172.19.94.10:5060
tport.c:4082 tport_resolve() tport_resolve addrinfo = 172.19.94.10:5060
tport.c:4716 tport_by_addrinfo() tport_by_addrinfo(0x616000272180): not found by name */172.19.94.10:5060
tport.c:3528 tport_send_msg() tport_vsend returned 769
nta.c:8394 outgoing_send() nta: sent SUBSCRIBE (61574337) to */172.19.94.10:5060
tport.c:4196 tport_pend() tport_pend(0x616000272180): pending 0x61900129f380 for udp/172.19.94.78:38763 (already 0)
nua.c:1164 nua_handle_unref_user() nua nua_handle_unref_user(0x612000143440): entering
nua_stack.c:559 nua_signal() nua(0x612000143440): sent signal r_handle_unref
nua.c:1157 nua_unref_user() nua: nua_unref_user: entering
nua_stack.c:559 nua_signal() nua((nil)): sent signal r_unref
tport.c:2777 tport_wakeup_pri() tport_wakeup_pri(0x616000272180): events IN
tport.c:2900 tport_recv_event() tport_recv_event(0x616000272180)
[Wed Dec 28 11:39:03 2022] ESC[33m[WARN]ESC[0m [54026020] SUBSCRIBE failed: 481 Subscription does not exist
tport.c:3241 tport_recv_iovec() tport_recv_iovec(0x616000272180) msg 0x619000ec9580 from (udp/172.19.94.78:38763) has 481 bytes, veclen = 1
tport.c:3059 tport_deliver() tport_deliver(0x616000272180): msg 0x619000ec9580 (481 bytes) from udp/172.19.94.10:38763/sip next=(nil)
nta.c:3370 agent_recv_response() nta: received 481 Subscription does not exist for SUBSCRIBE (61574337)
nta.c:3437 agent_recv_response() nta: 481 Subscription does not exist is going to a transaction
nta.c:9688 outgoing_estimate_delay() nta_outgoing: RTT is 0.395 ms
tport.c:4258 tport_release() tport_release(0x616000272180): 0x61900129f380 by 0x613002902b40 with 0x619000ec9580
nua_stack.c:301 nua_stack_event() nua(0x612000143440): event r_subscribe 481 Subscription does not exist
nua_dialog.c:397 nua_dialog_usage_remove_at() nua(0x612000143440): removing subscribe usage with event message-summary
nta.c:4541 nta_leg_destroy() nta_leg_destroy(0x60c00025cb40)
nta.c:4494 nta_leg_tcreate() nta_leg_tcreate(): Missing local dialog address
nua_stack.c:301 nua_stack_event() nua(0x612000143440): event r_subscribe 900 Internal error at nua_client.c:713 <---------- I think this is the error to investigate
nta.c:4541 nta_leg_destroy() nta_leg_destroy((nil))
nua_stack.c:599 nua_stack_signal() nua(0x612000143440): recv signal r_handle_unref
nua_stack.c:599 nua_stack_signal() nua((nil)): recv signal r_unref
nua_stack.c:389 nua_application_event() nua: nua_application_event: entering
nua.c:1164 nua_handle_unref_user() nua nua_handle_unref_user(0x612000143440): entering
nua_stack.c:559 nua_signal() nua(0x612000143440): sent signal r_handle_unref
nua.c:1157 nua_unref_user() nua: nua_unref_user: entering
nua_stack.c:559 nua_signal() nua((nil)): sent signal r_unref
nua_stack.c:389 nua_application_event() nua: nua_application_event: entering
AddressSanitizer:DEADLYSIGNAL
=================================================================
==27==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000090 (pc 0x7f87cea8c753 bp 0x7f87b3b5b190 sp 0x7f87b3b514f0 T56)
==27==The signal is caused by a READ memory access.
==27==Hint: address points to the zero page.
AddressSanitizer:DEADLYSIGNAL
    #0 0x7f87cea8c752 in janus_sip_sofia_callback plugins/janus_sip.c:6254
    #1 0x7f87c7cb6d52 in nua_application_event (/usr/lib/libsofia-sip-ua.so.0+0xc7d52)
    #2 0x7f87c7d24113 in su_base_port_execute_msgs (/usr/lib/libsofia-sip-ua.so.0+0x135113)
    #3 0x7f87c7d23ea1 in su_base_port_getmsgs (/usr/lib/libsofia-sip-ua.so.0+0x134ea1)
    #4 0x7f87c7d24208 in su_base_port_run (/usr/lib/libsofia-sip-ua.so.0+0x135208)
    #5 0x7f87c7d204f8 in su_port_run (/usr/lib/libsofia-sip-ua.so.0+0x1314f8)
    #6 0x7f87c7d215d3 in su_root_run (/usr/lib/libsofia-sip-ua.so.0+0x1325d3)
    #7 0x7f87cea50ed3 in janus_sip_sofia_thread plugins/janus_sip.c:7264
    #8 0x7f87d32ad4d4  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x764d4)
    #9 0x7f87d2971fa2 in start_thread /build/glibc-6iIyft/glibc-2.28/nptl/pthread_create.c:486
    #10 0x7f87d28a306e in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf906e)
lminiero commented 1 year ago

A 900 internal error means something broke deep inside the library, so outside of our control. The crash, however, may be happening after we're told of the error, and if so they're right the problem may be in Janus. Which commit is this on? Line 6254 is a comment in my latest version of the code.

carcharothellobo commented 1 year ago

This is my commit:

commit ef984223dd0cfbaa07445c18748f506899678305
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Mon Nov 7 14:12:36 2022 +0100

    Added check before dereferencing pointer in case of failed SUBSCRIBE

And this is my line 6254

                        } else if(status >= 400) {
                                /* Something went wrong */
                                JANUS_LOG(LOG_WARN, "[%s] SUBSCRIBE failed: %d %s\n", session->account.username, status, phrase ? phrase : "");
                                json_t *event = json_object();
                                json_object_set_new(event, "sip", json_string("event"));
          6254 ------> if(sip->sip_call_id)
                                        json_object_set_new(event, "call_id", json_string(sip->sip_call_id->i_id));
                                json_t *result = json_object();
                                json_object_set_new(result, "event", json_string("subscribe_failed"));
                                json_object_set_new(result, "code", json_integer(status));
                                json_object_set_new(result, "reason", json_string(phrase ? phrase : ""));
                                if(session->incoming_header_prefixes) {
                                        json_t *headers = janus_sip_get_incoming_headers(sip, session);
                                        json_object_set_new(result, "headers", headers);
                                }
                                json_object_set_new(event, "result", result);
                                int ret = gateway->push_event(session->handle, &janus_sip_plugin, session->transaction, event, NULL);
                                JANUS_LOG(LOG_VERB, "  >> Pushing event to peer: %d (%s)\n", ret, janus_get_api_error(ret));
                                json_decref(event);
                        }
                        break;
lminiero commented 1 year ago

Maybe sip is NULL when that error comes in. I'll add a NULL check for that.

lminiero commented 1 year ago

This commit should fix the issue if that was the cause. I'll close for now, but please do let us know if it's still a problem and in case we'll reopen.

carcharothellobo commented 1 year ago

Lorenzo just to let you know that everything is ok with this fix