Closed GoogleCodeExporter closed 9 years ago
It appears that HeapChecker doesn't have this problem.
Something is wrong with our StopTheWorld implementation.
Original comment by earth...@google.com
on 28 Aug 2013 at 3:13
Reproducer: http://pastebin.com/MAphVLyq
Original comment by earth...@google.com
on 28 Aug 2013 at 4:16
posting the repro inline (we should avoid third-party paste services for two
reasons: availability and search)
#include "pulse/context.h"
#include "pulse/thread-mainloop.h"
#include <sanitizer/lsan_interface.h>
pa_threaded_mainloop* input_mainloop_;
pa_context* input_context_;
void ContextStateCallback(pa_context* context, void* mainloop) {
pa_threaded_mainloop* pa_mainloop =
static_cast<pa_threaded_mainloop*>(mainloop);
pa_threaded_mainloop_signal(pa_mainloop, 0);
}
bool Init() {
// Create a mainloop API and connect to the default server.
// The mainloop is the internal asynchronous API event loop.
input_mainloop_ = pa_threaded_mainloop_new();
if (!input_mainloop_)
return false;
// Start the threaded mainloop.
if (pa_threaded_mainloop_start(input_mainloop_))
return false;
// Lock the event loop object, effectively blocking the event loop thread
// from processing events. This is necessary.
pa_threaded_mainloop_lock(input_mainloop_);
pa_mainloop_api* pa_mainloop_api =
pa_threaded_mainloop_get_api(input_mainloop_);
input_context_ = pa_context_new(pa_mainloop_api, "Chrome input");
if (!input_context_)
return false;
pa_context_set_state_callback(input_context_, &ContextStateCallback,
input_mainloop_);
if (pa_context_connect(input_context_, NULL, PA_CONTEXT_NOAUTOSPAWN, NULL))
return false;
// Wait until |input_context_| is ready. pa_threaded_mainloop_wait() must be
// called after pa_context_get_state() in case the context is already ready,
// otherwise pa_threaded_mainloop_wait() will hang indefinitely.
while (true) {
pa_context_state_t context_state = pa_context_get_state(input_context_);
if (!PA_CONTEXT_IS_GOOD(context_state))
return false;
if (context_state == PA_CONTEXT_READY)
break;
pa_threaded_mainloop_wait(input_mainloop_);
}
pa_threaded_mainloop_unlock(input_mainloop_);
return true;
}
int main() {
Init();
__lsan_do_leak_check();
pa_context_set_state_callback(input_context_, NULL, NULL);
pa_context_disconnect(input_context_);
pa_context_unref(input_context_);
}
Original comment by konstant...@gmail.com
on 29 Aug 2013 at 9:49
clang pulse.cc -fsanitize=address -lpulse && ASAN_OPTIONS=detect_leaks=1
./a.out
a.out: pthread_mutex_lock.c:321: __pthread_mutex_lock_full: Assertion `robust
|| (oldval & 0x40000000) == 0' failed.
ASAN:SIGSEGV
=================================================================
==13070==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc
0x7f13a9a06ae7 sp 0x7fff1a65fea0 bp 0x7f13aa3bff3e T0)
AddressSanitizer can not provide additional info.
#0 0x7f13a9a06ae6 (/lib/x86_64-linux-gnu/libc.so.6+0x39ae6)
#1 0x7f13a99fc0ed (/lib/x86_64-linux-gnu/libc.so.6+0x2f0ed)
#2 0x7f13a99fc191 (/lib/x86_64-linux-gnu/libc.so.6+0x2f191)
#3 0x7f13aa3b3a4c in __pthread_mutex_lock_full (/lib/x86_64-linux-gnu/libpthread.so.0+0x5a4c)
#4 0x7f13a95a5e1d (/usr/lib/x86_64-linux-gnu/libpulsecommon-1.1.so+0x3ee1d)
Original comment by konstant...@gmail.com
on 29 Aug 2013 at 9:55
I'm running with custom lubpulse binaries so it can unwind through them and
symbolize properly. Otherwise it's the same error.
More data points: it turns out that this is unrelated to the ptrace magic we're
doing. This issue reproduces even if the tracer thread exits immediately,
without attaching to threads or doing anything else. It reproduces even if we
keep the tracer thread alive with a sleep() call.
So it's caused either by the clone() call, or by the signal-related stuff we're
doing in StopTheWorldScope. I tried commenting out the sigaction calls and it
reproduced still. Commenting out the sigprocmask call causes the process to
hang, so not sure if it's related or not.
Original comment by earth...@google.com
on 29 Aug 2013 at 10:05
I think at this point you could try to create an lsan-independent reproducer.
Then maybe a pulse-independent reproducer.
Original comment by konstant...@gmail.com
on 29 Aug 2013 at 10:49
For an LSan-independent reproducer, replace main() with this:
#include <assert.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sched.h>
...
int TracerThread(void *) {
return 0;
}
void SpawnClone() {
const unsigned kStackSize = 2 * 1024 * 1024;
void* stack = mmap(0, kStackSize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON, -1, 0);
pid_t tracer_pid = clone(TracerThread, (char*)stack + kStackSize,
CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_UNTRACED,
0);
assert(tracer_pid >= 0);
sleep(1); // let the thread start
return;
}
int main(int, char**) {
Init();
SpawnClone();
pa_context_set_state_callback(input_context_, NULL, NULL);
pa_context_disconnect(input_context_);
pa_context_unref(input_context_);
}
I believe this has something to do with using glibc's wrapper for clone()
(whereas HeapChecker uses a direct syscall).
Original comment by earth...@google.com
on 29 Aug 2013 at 11:05
Ok, I think I've tracked down the source of the problem. First of all I tested
this against the clone() implementation in linux-syscall-support, and that one
works fine. This error happens only when glibc's clone() implementation is used.
In the disassembly of glibc's clone, the branch that executes in the child
process does this:
...
|0x7ffff78c6ca4 <clone+68> test $0x100,%rdi
|0x7ffff78c6cab <clone+75> mov $0xffffffff,%eax
|0x7ffff78c6cb0 <clone+80> jne 0x7ffff78c6cb9 <clone+89>
|0x7ffff78c6cb2 <clone+82> mov $0x27,%eax
|0x7ffff78c6cb7 <clone+87> syscall
|0x7ffff78c6cb9 <clone+89> mov %eax,%fs:0x2d4
|0x7ffff78c6cc1 <clone+97> mov %eax,%fs:0x2d0
|0x7ffff78c6cc9 <clone+105> pop %rax
|0x7ffff78c6cca <clone+106> pop %rdi
|0x7ffff78c6ccb <clone+107> callq *%rax
|0x7ffff78c6ccd <clone+109> mov %rax,%rdi
|0x7ffff78c6cd0 <clone+112> callq 0x7ffff7892be0 <__GI__exit>
Translation:
if (flags && CLONE_VM) %eax = $0xffffffff;
else %eax = getpid();
(cryptic TLS location 1) = %eax;
(cryptic TLS location 2) = %eax;
exit(callback())
So if we don't pass the newtls parameter, this corrupts the TLS of the parent
process in either case, because %fs points to the same memory.
I have confirmed that rewriting those values with the correct PID fixes the
problem:
// clone(...);
// wait for the child process to finish
int pid = internal_getpid();
__asm__ __volatile__(
"mov %0,%%fs:0x2d4\n"
"mov %0,%%fs:0x2d0\n"
:
: "r"(pid) );
Original comment by earth...@google.com
on 29 Aug 2013 at 2:25
In glibc sources:
#ifdef RESET_PID
testq $CLONE_THREAD, %rdi
jne 1f
testq $CLONE_VM, %rdi
movl $-1, %eax
jne 2f
movl $SYS_ify(getpid), %eax
syscall
2: movl %eax, %fs:PID
movl %eax, %fs:TID
1:
#endif
Original comment by earth...@google.com
on 29 Aug 2013 at 2:32
Original comment by earth...@google.com
on 24 Nov 2013 at 2:59
Adding Project:AddressSanitizer as part of GitHub migration.
Original comment by ramosian.glider@gmail.com
on 30 Jul 2015 at 9:13
Original issue reported on code.google.com by
earth...@google.com
on 28 Aug 2013 at 2:29