qchbai / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

ProfilerStop may deadlock #409

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Enable cpu profiling on a multi-threaded application using glibc malloc.

What version of the product are you using? On what operating system?

  google-perftools-1.9.1 with glibc on x86_64

Additional information:

First, the profiled thread.  Note the signal handler is executed during a 
malloc, which is holding its own lock, when the handler attempts to lock the 
signal_lock_.

    Thread 8 (Thread ... (LWP 21125)):
    #0  ... in base::internal::SpinLockDelay (w=0x8b74f0, value=2, loop=<value optimized out>)
        at ./src/base/linux_syscall_support.h:2287
    #1  ... in SpinLock::SlowLock (this=0x8b74f0) at src/base/spinlock.cc:132
    #2  ... in ProfileHandler::SignalHandler (sig=27, sinfo=0x7f00285bd730, ucontext=0x7f00285bd600)
        at src/base/spinlock.h:75
    #3  <signal handler called>
    #4  ... in malloc () from /lib64/libc.so.6
    #5  ... in operator new () from /usr/lib64/libstdc++.so.6
    #6  ... in ...
    #7  ... in ...
    #8  ... in ...
    #9  ... in ...
    #10 ... in start_thread () from /lib64/libpthread.so.0
    #11 ... in clone () from /lib64/libc.so.6

Next, the buggy thread, blocked in a free() during UnregisterCallback while 
holding the signal lock.

    Thread 2 (Thread ... (LWP 21131)):
    #0  ... in __lll_lock_wait_private () from /lib64/libc.so.6
    #1  ... in _L_lock_4781 () from /lib64/libc.so.6
    #2  ... in free () from /lib64/libc.so.6
    #3  ... in ProfileHandler::UnregisterCallback (this=0x8b74d0, token=0x7f001c5cfa10) at src/profile-handler.cc:368
    #4  ... in CpuProfiler::DisableHandler (this=0x8b24c0) at src/profiler.cc:251
    #5  ... in CpuProfiler::Stop (this=0x8b24c0) at src/profiler.cc:198
    #6  ... in ...
    #7  ... in ...
    #8  ... in start_thread () from /lib64/libpthread.so.0
    #9  ... in clone () from /lib64/libc.so.6

The attached patch has not yet been tested in the environment that triggered 
the deadlock, and the issue is not easily reproducible.

However the change is very simple -- delete the ProfileHandlerToken after 
releasing the signal lock.  The lock only needs to protect the callback list.

Cheers,
Jeremy

Original issue reported on code.google.com by jeremy.r...@gmail.com on 1 Mar 2012 at 8:04

Attachments:

GoogleCodeExporter commented 9 years ago
Hmm, actually I am not sure that patch is safe.  I hate working with std 
iterators.

To be sure, here is another.  Feel free to implement this how you wish, but the 
key is not to delete the handler token within the SpinLock anonymous scope.

Original comment by jeremy.r...@gmail.com on 1 Mar 2012 at 8:13

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks Jeremy. I should get a chance to review this in the next few days.

Original comment by chapp...@gmail.com on 2 Mar 2012 at 6:49

GoogleCodeExporter commented 9 years ago

Original comment by chapp...@gmail.com on 4 May 2012 at 5:32

GoogleCodeExporter commented 9 years ago
Hi Jeremy,

I noticed that the patches that you submitted above have been deleted. I am 
doing patch work right now and would like to take in your patch but just want 
to verify first whether or not you deleted it for a specific reason.

-Dave

Original comment by chapp...@gmail.com on 18 Sep 2012 at 2:52

GoogleCodeExporter commented 9 years ago
Hi Dave,

Yep, the original patches were not sufficient and I deleted them to prevent 
their application.

I found a larger patch was required to sidestep all unsafe STL container 
operations that were being performed in the signal handler.  I can attach the 
updated diff tomorrow (PDT) if you are patching.

  - Jeremy

Original comment by jeremy.r...@gmail.com on 18 Sep 2012 at 3:48

GoogleCodeExporter commented 9 years ago
Awesome. Thanks Jeremey.

Original comment by chapp...@gmail.com on 18 Sep 2012 at 4:19

GoogleCodeExporter commented 9 years ago
Jeremy, any chance I can get a copy of that patch?

Cheers,

Dave

Original comment by chapp...@gmail.com on 28 Oct 2012 at 2:55

GoogleCodeExporter commented 9 years ago
from description above ProfilerStop may cause free() to be called while holding 
same lock profiler signal handler is taking. Which may happen during malloc() 
call.

Original comment by alkondratenko on 15 Sep 2013 at 12:15