kamailio / kamailio

Kamailio - The Open Source SIP Server for large VoIP and real-time communication platforms -
https://www.kamailio.org
Other
2.26k stars 939 forks source link

debugger: kamailio freezes when dynamically setting module level(using fifo/rpc) #463

Closed smititelu closed 8 years ago

smititelu commented 8 years ago

... if the module level is not previously set in config file: modparam("debugger", "mod_level", "core=2")

This is not happening when setting the module facility, for the above, same conditions. This is happening also before pull-request [1] using "kamcmd dbg.mod_level core 1". After some debugging I've noticed that this is happening when trying to set a level for a module name whose idx = hid&(_dbg_mod_table_size-1); is even number?!, but not for one that reduces to an odd number (i.e. module name "core" reduces to an even index and "corex" to odd); the idx is always in the range _dbg_mod_table_size-1 as it should be.

Trying to solve this, I commented the lock_get/release in dbg_set_mod_debug_level() and saw it's working; kamailio doesn't freeze anymore. Thus, I tried to refactor the locks in struct _dbg_mod_slot to be dynamically allocated/deallocated using lock_alloc()/destroy() without success.

I'm out of ideas. Do you have any idea what might lead to this strange deadlock?

[1] https://github.com/kamailio/kamailio/pull/462

smititelu commented 8 years ago

Also I've been comparing the lock_get()/release() for level vs facility without seeing any differences. But maybe I'm missing something.

miconda commented 8 years ago

Can you attach with gdb to the locked process (probably the ctl handler) and get a backtrace from it?

smititelu commented 8 years ago

Sorry, I overlooked gdb: (gdb) bt

0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

1 0x00007f600269d47c in futex_get (lock=0x7f5ffe91d748) at ../cfgt/../../parser/../mem/../futexlock.h:121

2 0x00007f60026a6190 in dbg_get_mod_debug_level (mname=0x797349 "core", mnlen=4, mlevel=0x7ffc8bb3a9fc) at debugger_api.c:1479

3 0x00000000004752c5 in get_debug_level (mname=0x797349 "core", mnlen=4) at dprint.c:164

4 0x000000000063fc06 in fm_malloc (qmp=0x7f5ffe699000, size=37, file=0x7f60026b9b34 "debugger: debugger_api.c", func=0x7f60026bae40 <FUNCTION.8648> "dbg_set_mod_debug_level", line=1352,

mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:438

5 0x000000000064633c in fm_shm_malloc (qmp=0x7f5ffe699000, size=37, file=0x7f60026b9b34 "debugger: debugger_api.c", func=0x7f60026bae40 <FUNCTION.8648> "dbg_set_mod_debug_level", line=1352,

mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:1088

6 0x00007f60026a565a in dbg_set_mod_debug_level (mname=0x2aa07e0 "core", mnlen=4, mlevel=0x7ffc8bb3abfc) at debugger_api.c:1352

7 0x00007f60026a3806 in dbg_rpc_set_mod_level (rpc=0x7f6003616660 , ctx=0x7ffc8bb3acd0) at debugger_api.c:999

8 0x00007f60033d2e77 in process_rpc_req (buf=0x2aa07c4 "\241\003\034\023\212\243\062\221\022dbg.set_mod_level", size=35, bytes_needed=0x7ffc8bb3ade8, sh=0x7ffc8bb3ae10, saved_state=0x2ab07c8)

at binrpc_run.c:675

9 0x00007f6003404253 in handle_stream_read (s_c=0x2aa0790, idx=-1) at io_listener.c:511

10 0x00007f6003405c8a in handle_io (fm=0x7f6006d84498, events=1, idx=-1) at io_listener.c:706

11 0x00007f60033fd4df in io_wait_loop_epoll (h=0x7f6003617420 , t=10, repeat=0) at ../../io_wait.h:1061

12 0x00007f60034013c3 in io_listen_loop (fd_no=1, cs_lst=0x2a9e400) at io_listener.c:281

13 0x00007f60033db58c in mod_child (rank=0) at ctl.c:320

14 0x00000000005ae1cb in init_mod_child (m=0x7f6006d25060, rank=0) at sr_module.c:915

15 0x00000000005adee8 in init_mod_child (m=0x7f6006d25660, rank=0) at sr_module.c:912

16 0x00000000005adee8 in init_mod_child (m=0x7f6006d26d78, rank=0) at sr_module.c:912

17 0x00000000005adee8 in init_mod_child (m=0x7f6006d272b8, rank=0) at sr_module.c:912

18 0x00000000005adee8 in init_mod_child (m=0x7f6006d276e8, rank=0) at sr_module.c:912

19 0x00000000005adee8 in init_mod_child (m=0x7f6006d28328, rank=0) at sr_module.c:912

20 0x00000000005ae503 in init_child (rank=0) at sr_module.c:941

21 0x00000000004b34c1 in main_loop () at main.c:1671

22 0x00000000004b9c4c in main (argc=4, argv=0x7ffc8bb3b9a8) at main.c:2597

The deadlock makes sense. MDBG() in fm_malloc() issues dbg_get_mod_debug_level() which gets the same lock that dbg_set_mod_debug_level() already got.

The first option would be to never use locks for dbg_get_mod_debug_level()(because the list is just iterated through, no list mutations). The other option is to create different get/set locks for level/facility in struct _dbg_mod_slot.

What do you think is the best option?

miconda commented 8 years ago

This is a recursive access to the same slot in debugger hash table:

Solutions:

smititelu commented 8 years ago

Added fix with the second solution on pull-request #469. Basically tested it and the freeze is not happening anymore. However, the first solution can still be logged as an enhancement issue.

If all is ok I will backport this to 4.X branches.

miconda commented 8 years ago

I am fine with second solution fix, just needs to be kept in mind that no direct or indirect or direct LOG/DBG statements insides locked regions of debugger hash table.

You can merge and close the related issues on tracker.