lewisje / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
0 stars 0 forks source link

__lsan_do_leak_check() causes a crash in libpulseaudio #214

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
To reproduce, apply this patch to Chrome:

diff --git a/chrome/browser/browser_process_impl.cc 
b/chrome/browser/browser_process_impl.cc
index f005b88..c878426 100644
--- a/chrome/browser/browser_process_impl.cc
+++ b/chrome/browser/browser_process_impl.cc
@@ -320,6 +320,8 @@ unsigned int BrowserProcessImpl::AddRefModule() {
   return module_ref_count_;
 }

+extern "C" void __lsan_do_leak_check();
+
 unsigned int BrowserProcessImpl::ReleaseModule() {
   DCHECK(CalledOnValidThread());
   DCHECK_NE(0u, module_ref_count_);
@@ -335,6 +337,10 @@ unsigned int BrowserProcessImpl::ReleaseModule() {
     print_job_manager_.reset();
 #endif

+#if defined(LEAK_SANITIZER)
+    __lsan_do_leak_check();
+#endif
+
     CHECK(base::MessageLoop::current()->is_running());

 #if defined(OS_MACOSX)

With this change, leak detection is invoked as early as possible in the browser 
shutdown process.

However, if there are no leaks (i.e. __lsan_do_leak_check() doesn't kill the 
process immediately), we later see this:

out_goma_lsan/Release/chrome: pthread_mutex_lock.c:321: 
__pthread_mutex_lock_full: Assertion `robust || (oldval & 0x40000000) == 0' 
failed.
ASAN:SIGSEGV
=================================================================
==12960==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 
0x7fd48ea78ae7 sp 0x7fff5a9b5150 bp 0x7fd48fef9f3e T0)
AddressSanitizer can not provide additional info.
    #0 0x7fd48ea78ae6 in __GI_abort /build/buildd/eglibc-2.15/stdlib/abort.c:68
    #1 0x7fd48ea6e0ed in __assert_fail_base /build/buildd/eglibc-2.15/assert/assert.c:94
    #2 0x7fd48ea6e191 in __GI___assert_fail /build/buildd/eglibc-2.15/assert/assert.c:103
    #3 0x7fd48feeda4c in __pthread_mutex_lock_full /build/buildd/eglibc-2.15/nptl/pthread_mutex_lock.c:321
    #4 0x7fd479bd9411 in pa_mutex_lock sources/pulseaudio/pulseaudio-1.1/src/pulsecore/mutex-posix.c:88
    #5 0x7fd479bc2db0 in pa_memexport_free sources/pulseaudio/pulseaudio-1.1/src/pulsecore/memblock.c:1067
    #6 0x7fd479bcc01f in pa_pstream_unlink sources/pulseaudio/pulseaudio-1.1/src/pulsecore/pstream.c:974
    #7 0x7fd479bcc01f in pa_pstream_unlink sources/pulseaudio/pulseaudio-1.1/src/pulsecore/pstream.c:960
    #8 0x7fd47f0b75c1 in context_unlink sources/pulseaudio/pulseaudio-1.1/src/pulse/context.c:211
    #9 0x7fd47f0b7f47 in pa_context_set_state sources/pulseaudio/pulseaudio-1.1/src/pulse/context.c:294
    #10 0x7fd47f0b7f47 in pa_context_set_state sources/pulseaudio/pulseaudio-1.1/src/pulse/context.c:279
    #11 0x7fd49ded029a in media::AudioManagerPulse::DestroyPulse() media/audio/pulse/audio_manager_pulse.cc:265
    #12 0x7fd49ded0132 in ~AudioManagerPulse media/audio/pulse/audio_manager_pulse.cc:64
    #13 0x7fd49ded0132 in media::AudioManagerPulse::~AudioManagerPulse() media/audio/pulse/audio_manager_pulse.cc:59
    #14 0x7fd499ea25d0 in operator() base/memory/scoped_ptr.h:137
    #15 0x7fd499ea25d0 in ~scoped_ptr_impl base/memory/scoped_ptr.h:220
    #16 0x7fd499ea25d0 in ~scoped_ptr_impl base/memory/scoped_ptr.h:216
    #17 0x7fd499ea25d0 in ~scoped_ptr base/memory/scoped_ptr.h:432
    #18 0x7fd499ea25d0 in ~scoped_ptr base/memory/scoped_ptr.h:432
    #19 0x7fd499ea25d0 in content::BrowserMainLoop::~BrowserMainLoop() content/browser/browser_main_loop.cc:326
    #20 0x7fd499ea1e3d in content::BrowserMainLoop::~BrowserMainLoop() content/browser/browser_main_loop.cc:320
    #21 0x7fd49a2b9c90 in operator() base/memory/scoped_ptr.h:137
    #22 0x7fd49a2b9c90 in reset base/memory/scoped_ptr.h:246
    #23 0x7fd49a2b9c90 in reset base/memory/scoped_ptr.h:367
    #24 0x7fd49a2b9c90 in content::BrowserMainRunnerImpl::Shutdown() content/browser/browser_main_runner.cc:136
    #25 0x7fd49f41ec8f in content::BrowserMain(content::MainFunctionParams const&) content/browser/browser_main.cc:28
    #26 0x7fd49a7c3543 in content::RunNamedProcessTypeMain(std::string const&, content::MainFunctionParams const&, content::ContentMainDelegate*) content/app/content_main_runner.cc:456
    #27 0x7fd49a7c4dad in content::ContentMainRunnerImpl::Run() content/app/content_main_runner.cc:776
    #28 0x7fd49a7c1751 in content::ContentMain(int, char const**, content::ContentMainDelegate*) content/app/content_main.cc:35
    #29 0x7fd498141ca6 in ChromeMain chrome/app/chrome_main.cc:39
    #30 0x7fd498141bea in main chrome/app/chrome_exe_main_gtk.cc:43
    #31 0x7fd48ea6076c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
    #32 0x7fd498141b0c in _start (/mnt/ssd/chromium/src/out_goma_lsan/Release/chrome+0x14f6b0c)
SUMMARY: AddressSanitizer: SEGV /build/buildd/eglibc-2.15/stdlib/abort.c:68 
__GI_abort
==12960==ABORTING

0x40000000 is FUTEX_OWNER_DIED. This reproduces in every run of the browser 
process (be it browser_tests or full Chrome).

This is apparently caused by StopTheWorld. It keeps happening even if I hack 
LSan to not do any real work under StopTheWorld. It's enough to just freeze the 
threads and unfreeze them immediately.

My initial theory was that a libpulseaudio thread noticed that too much time 
had passed (while it was frozen) and timed out while holding the futex. But 
it's actually very unlikely that something like that would reproduce every time 
- and that it would die holding an allocator futex, of all things.

TODO: cross-test our StopTheWorld implementation with HeapChecker's.

Original issue reported on code.google.com by earth...@google.com on 28 Aug 2013 at 2:29

GoogleCodeExporter commented 9 years ago
It appears that HeapChecker doesn't have this problem.

Something is wrong with our StopTheWorld implementation.

Original comment by earth...@google.com on 28 Aug 2013 at 3:13

GoogleCodeExporter commented 9 years ago
Reproducer: http://pastebin.com/MAphVLyq

Original comment by earth...@google.com on 28 Aug 2013 at 4:16

GoogleCodeExporter commented 9 years ago
posting the repro inline (we should avoid third-party paste services for two 
reasons: availability and search)

#include "pulse/context.h"                                                      

#include "pulse/thread-mainloop.h"                                              

#include <sanitizer/lsan_interface.h>                                           

pa_threaded_mainloop* input_mainloop_;                                          

pa_context* input_context_;                                                     

void ContextStateCallback(pa_context* context, void* mainloop) {                

    pa_threaded_mainloop* pa_mainloop =                                                                                                                                                                      
              static_cast<pa_threaded_mainloop*>(mainloop);                                                                                                                                                  
      pa_threaded_mainloop_signal(pa_mainloop, 0);                                                                                                                                                           
}                                                                               

bool Init() {                                                                   

  // Create a mainloop API and connect to the default server.                                                                                                                                                
  // The mainloop is the internal asynchronous API event loop.                                                                                                                                               
  input_mainloop_ = pa_threaded_mainloop_new();                                                                                                                                                              
  if (!input_mainloop_)                                                                                                                                                                                      
    return false;                                                                                                                                                                                            

  // Start the threaded mainloop.                                                                                                                                                                            
  if (pa_threaded_mainloop_start(input_mainloop_))                                                                                                                                                           
    return false;                                                                                                                                                                                            

  // Lock the event loop object, effectively blocking the event loop thread                                                                                                                                  
  // from processing events. This is necessary.                                                                                                                                                              
  pa_threaded_mainloop_lock(input_mainloop_);                                                                                                                                                                

  pa_mainloop_api* pa_mainloop_api =                                                                                                                                                                         
      pa_threaded_mainloop_get_api(input_mainloop_);                                                                                                                                                         
  input_context_ = pa_context_new(pa_mainloop_api, "Chrome input");                                                                                                                                          
  if (!input_context_)                                                                                                                                                                                       
    return false;                                                                                                                                                                                            

  pa_context_set_state_callback(input_context_, &ContextStateCallback,                                                                                                                                       
                                input_mainloop_);                                                                                                                                                            
  if (pa_context_connect(input_context_, NULL, PA_CONTEXT_NOAUTOSPAWN, NULL))                                                                                                                                
    return false;                                                                                                                                                                                            

  // Wait until |input_context_| is ready.  pa_threaded_mainloop_wait() must be                                                                                                                              
  // called after pa_context_get_state() in case the context is already ready,                                                                                                                               
  // otherwise pa_threaded_mainloop_wait() will hang indefinitely.                                                                                                                                           
  while (true) {                                                                                                                                                                                             
    pa_context_state_t context_state = pa_context_get_state(input_context_);                                                                                                                                 
    if (!PA_CONTEXT_IS_GOOD(context_state))                                                                                                                                                                  
      return false;                                                                                                                                                                                          
    if (context_state == PA_CONTEXT_READY)                                                                                                                                                                   
      break;                                                                                                                                                                                                 
    pa_threaded_mainloop_wait(input_mainloop_);                                                                                                                                                              
  }                                                                                                                                                                                                          

  pa_threaded_mainloop_unlock(input_mainloop_);                                                                                                                                                              
  return true;                                                                                                                                                                                               
}                                                                               

int main() {                                                                    

  Init();                                                                                                                                                                                                    
  __lsan_do_leak_check();                                                                                                                                                                                    
  pa_context_set_state_callback(input_context_, NULL, NULL);                                                                                                                                                 
  pa_context_disconnect(input_context_);                                                                                                                                                                     
  pa_context_unref(input_context_);                                                                                                                                                                          
}

Original comment by konstant...@gmail.com on 29 Aug 2013 at 9:49

GoogleCodeExporter commented 9 years ago
clang  pulse.cc -fsanitize=address -lpulse && ASAN_OPTIONS=detect_leaks=1 
./a.out
a.out: pthread_mutex_lock.c:321: __pthread_mutex_lock_full: Assertion `robust 
|| (oldval & 0x40000000) == 0' failed.
ASAN:SIGSEGV
=================================================================
==13070==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 
0x7f13a9a06ae7 sp 0x7fff1a65fea0 bp 0x7f13aa3bff3e T0)
AddressSanitizer can not provide additional info.
    #0 0x7f13a9a06ae6 (/lib/x86_64-linux-gnu/libc.so.6+0x39ae6)
    #1 0x7f13a99fc0ed (/lib/x86_64-linux-gnu/libc.so.6+0x2f0ed)
    #2 0x7f13a99fc191 (/lib/x86_64-linux-gnu/libc.so.6+0x2f191)
    #3 0x7f13aa3b3a4c in __pthread_mutex_lock_full (/lib/x86_64-linux-gnu/libpthread.so.0+0x5a4c)
    #4 0x7f13a95a5e1d (/usr/lib/x86_64-linux-gnu/libpulsecommon-1.1.so+0x3ee1d)

Original comment by konstant...@gmail.com on 29 Aug 2013 at 9:55

GoogleCodeExporter commented 9 years ago
I'm running with custom lubpulse binaries so it can unwind through them and 
symbolize properly. Otherwise it's the same error.

More data points: it turns out that this is unrelated to the ptrace magic we're 
doing. This issue reproduces even if the tracer thread exits immediately, 
without attaching to threads or doing anything else. It reproduces even if we 
keep the tracer thread alive with a sleep() call.

So it's caused either by the clone() call, or by the signal-related stuff we're 
doing in StopTheWorldScope. I tried commenting out the sigaction calls and it 
reproduced still. Commenting out the sigprocmask call causes the process to 
hang, so not sure if it's related or not.

Original comment by earth...@google.com on 29 Aug 2013 at 10:05

GoogleCodeExporter commented 9 years ago
I think at this point you could try to create an lsan-independent reproducer. 
Then maybe a pulse-independent reproducer. 

Original comment by konstant...@gmail.com on 29 Aug 2013 at 10:49

GoogleCodeExporter commented 9 years ago
For an LSan-independent reproducer, replace main() with this:

#include <assert.h>                                                             

#include <sys/mman.h>                                                           

#include <unistd.h>                                                             

#include <sched.h>  

...

int TracerThread(void *) {                                                      

  return 0;                                                                                                                                                                                                  
}  

void SpawnClone() {                                                             

  const unsigned kStackSize = 2 * 1024 * 1024;                                                                                                                                                               
  void* stack = mmap(0, kStackSize, PROT_READ | PROT_WRITE,                                                                                                                                                  
                     MAP_PRIVATE | MAP_ANON, -1, 0);                                                                                                                                                         
  pid_t tracer_pid = clone(TracerThread, (char*)stack + kStackSize,                                                                                                                                          
                           CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_UNTRACED,                                                                                                                               
                           0);                                                                                                                                                                               
  assert(tracer_pid >= 0);                                                                                                                                                                                   
  sleep(1); // let the thread start                                                                                                                                                                          
  return;                                                                                                                                                                                                    
}                                                                               

int main(int, char**) {                                                         

  Init();                                                                                                                                                                                                    
  SpawnClone();                                                                                                                                                                                              
  pa_context_set_state_callback(input_context_, NULL, NULL);                                                                                                                                                 
  pa_context_disconnect(input_context_);                                                                                                                                                                     
  pa_context_unref(input_context_);                                                                                                                                                                          
}

I believe this has something to do with using glibc's wrapper for clone() 
(whereas HeapChecker uses a direct syscall).

Original comment by earth...@google.com on 29 Aug 2013 at 11:05

GoogleCodeExporter commented 9 years ago
Ok, I think I've tracked down the source of the problem. First of all I tested 
this against the clone() implementation in linux-syscall-support, and that one 
works fine. This error happens only when glibc's clone() implementation is used.

In the disassembly of glibc's clone, the branch that executes in the child 
process does this:

...
   |0x7ffff78c6ca4 <clone+68>       test   $0x100,%rdi
   |0x7ffff78c6cab <clone+75>       mov    $0xffffffff,%eax
   |0x7ffff78c6cb0 <clone+80>       jne    0x7ffff78c6cb9 <clone+89>
   |0x7ffff78c6cb2 <clone+82>       mov    $0x27,%eax
   |0x7ffff78c6cb7 <clone+87>       syscall
   |0x7ffff78c6cb9 <clone+89>       mov    %eax,%fs:0x2d4
   |0x7ffff78c6cc1 <clone+97>       mov    %eax,%fs:0x2d0
   |0x7ffff78c6cc9 <clone+105>      pop    %rax
   |0x7ffff78c6cca <clone+106>      pop    %rdi
   |0x7ffff78c6ccb <clone+107>      callq  *%rax
   |0x7ffff78c6ccd <clone+109>      mov    %rax,%rdi
   |0x7ffff78c6cd0 <clone+112>      callq  0x7ffff7892be0 <__GI__exit>

Translation:

if (flags && CLONE_VM) %eax = $0xffffffff;
else %eax = getpid();
(cryptic TLS location 1) = %eax;
(cryptic TLS location 2) = %eax;
exit(callback())

So if we don't pass the newtls parameter, this corrupts the TLS of the parent 
process in either case, because %fs points to the same memory.

I have confirmed that rewriting those values with the correct PID fixes the 
problem:

// clone(...);
// wait for the child process to finish

int pid = internal_getpid();                                                    

  __asm__ __volatile__(                                                                                                                                                                                      
     "mov    %0,%%fs:0x2d4\n"                                                                                                                                                                                
     "mov    %0,%%fs:0x2d0\n"                                                                                                                                                                                
     :                                                                                                                                                                                                       
     : "r"(pid)  ); 

Original comment by earth...@google.com on 29 Aug 2013 at 2:25

GoogleCodeExporter commented 9 years ago
In glibc sources:

#ifdef RESET_PID                                                                

  testq $CLONE_THREAD, %rdi                                                                                                                                                                                  
  jne 1f                                                                                                                                                                                                     
  testq $CLONE_VM, %rdi                                                                                                                                                                                      
  movl  $-1, %eax                                                                                                                                                                                            
  jne 2f                                                                                                                                                                                                     
  movl  $SYS_ify(getpid), %eax                                                                                                                                                                               
  syscall                                                                                                                                                                                                    
2:  movl  %eax, %fs:PID                                                         

  movl  %eax, %fs:TID                                                                                                                                                                                        
1:                                                                              

#endif 

Original comment by earth...@google.com on 29 Aug 2013 at 2:32

GoogleCodeExporter commented 9 years ago

Original comment by earth...@google.com on 24 Nov 2013 at 2:59

GoogleCodeExporter commented 9 years ago
Adding Project:AddressSanitizer as part of GitHub migration.

Original comment by ramosian.glider@gmail.com on 30 Jul 2015 at 9:13