Closed smititelu closed 8 years ago
Also I've been comparing the lock_get()/release() for level vs facility without seeing any differences. But maybe I'm missing something.
Can you attach with gdb to the locked process (probably the ctl handler) and get a backtrace from it?
Sorry, I overlooked gdb: (gdb) bt
mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:438
mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:1088
at binrpc_run.c:675
The deadlock makes sense. MDBG() in fm_malloc() issues dbg_get_mod_debug_level() which gets the same lock that dbg_set_mod_debug_level() already got.
The first option would be to never use locks for dbg_get_mod_debug_level()(because the list is just iterated through, no list mutations).
The other option is to create different get/set locks for level/facility in struct _dbg_mod_slot
.
What do you think is the best option?
This is a recursive access to the same slot in debugger hash table:
Solutions:
Added fix with the second solution on pull-request #469. Basically tested it and the freeze is not happening anymore. However, the first solution can still be logged as an enhancement issue.
If all is ok I will backport this to 4.X branches.
I am fine with second solution fix, just needs to be kept in mind that no direct or indirect or direct LOG/DBG statements insides locked regions of debugger hash table.
You can merge and close the related issues on tracker.
... if the module level is not previously set in config file:
modparam("debugger", "mod_level", "core=2")
This is not happening when setting the module facility, for the above, same conditions. This is happening also before pull-request [1] using "kamcmd dbg.mod_level core 1". After some debugging I've noticed that this is happening when trying to set a level for a module name whose
idx = hid&(_dbg_mod_table_size-1);
is even number?!, but not for one that reduces to an odd number (i.e. module name "core" reduces to an even index and "corex" to odd); the idx is always in the range_dbg_mod_table_size-1
as it should be.Trying to solve this, I commented the lock_get/release in
dbg_set_mod_debug_level()
and saw it's working; kamailio doesn't freeze anymore. Thus, I tried to refactor the locks instruct _dbg_mod_slot
to be dynamically allocated/deallocated using lock_alloc()/destroy() without success.I'm out of ideas. Do you have any idea what might lead to this strange deadlock?
[1] https://github.com/kamailio/kamailio/pull/462