Thread sanitizer crash when user does not have sufficiently high rtprio limit

u-ra commented 5 years ago

A program built with thread sanitizer will crash on startup if it tries to set real time scheduling with priority higher than the user limit:

#include <stdio.h>
#include <pthread.h>
#include <sched.h>

static void *test_thread(void *arg)
{
   puts("Hello\n");
   return NULL;
}

int main(int argc, char **argv)
{
   pthread_t thread;
   pthread_attr_t attr;
   struct sched_param param;

   pthread_attr_init(&attr);
   pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
   pthread_attr_setschedpolicy(&attr, SCHED_RR);
   pthread_attr_getschedparam(&attr, &param);
   param.sched_priority = 20;
   pthread_attr_setschedparam(&attr, &param);

   pthread_create(&thread, &attr, test_thread, NULL);
   pthread_attr_destroy(&attr);
   pthread_join(thread, NULL);

   return 0;
}

$ clang --version
clang version 8.0.0 
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm8/bin

$ clang -pthread -fsanitize=thread -o test test.c  

$ ulimit -r  
99  

$ ./test  
Hello  

$ ulimit -r 10  

$ ./test  
ThreadSanitizer:DEADLYSIGNAL
==4643==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000001048 (pc 0x000000498ee6 bp 0x7fb78c2bde60 sp 0x7fb78c2bd7d0 T4645)
==4643==The signal is caused by a READ memory access.
ThreadSanitizer:DEADLYSIGNAL
ThreadSanitizer: nested bug in the same thread, aborting.

$ gdb ./test
Reading symbols from ./test...done.
(gdb) r
Starting program: /home/jura/test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Thread 3 "test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff55fe700 (LWP 4665)]
Allocate () at sanitizer_allocator_local_cache.h:45
45          if (UNLIKELY(c->count == 0)) {
(gdb) bt
#0  Allocate () at sanitizer_allocator_local_cache.h:45
#1  Allocate () at sanitizer_allocator_combined.h:73
#2  0x0000000000498c1f in user_alloc_internal () at tsan_mman.cc:164
#3  0x00000000004993d1 in user_alloc () at tsan_mman.cc:189
#4  0x00000000004453d6 in __interceptor_malloc () at tsan_interceptors.cc:673
#5  0x00007ffff7fe0bab in _dl_map_object_deps (map=map@entry=0x7ffff7f9a3c0, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:478
#6  0x00007ffff7fe62d1 in dl_open_worker (a=a@entry=0x7ffff55bd8f0) at dl-open.c:260
#7  0x00007ffff7d88457 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:196
#8  0x00007ffff7fe5e3f in _dl_open (file=0x7ffff7f8da0f "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7ffff7f8aa9c <pthread_cancel_init+44>, nsid=<optimized out>, argc=1, argv=<optimized out>, env=0x7fffffffe328) at dl-open.c:588
#9  0x00007ffff7d87951 in do_dlopen (ptr=ptr@entry=0x7ffff55bdb30) at dl-libc.c:96
#10 0x00007ffff7d88457 in __GI__dl_catch_exception (exception=exception@entry=0x7ffff55bdab0, operate=operate@entry=0x7ffff7d87910 <do_dlopen>, args=args@entry=0x7ffff55bdb30) at dl-error-skeleton.c:196
#11 0x00007ffff7d884f3 in __GI__dl_catch_error (objname=objname@entry=0x7ffff55bdb08, errstring=errstring@entry=0x7ffff55bdb10, mallocedp=mallocedp@entry=0x7ffff55bdb07, operate=operate@entry=0x7ffff7d87910 <do_dlopen>, args=args@entry=0x7ffff55bdb30) at dl-error-skeleton.c:215
#12 0x00007ffff7d87a57 in dlerror_run (operate=operate@entry=0x7ffff7d87910 <do_dlopen>, args=args@entry=0x7ffff55bdb30) at dl-libc.c:46
#13 0x00007ffff7d87afa in __GI___libc_dlopen_mode (name=name@entry=0x7ffff7f8da0f "libgcc_s.so.1", mode=mode@entry=-2147483646) at dl-libc.c:195
#14 0x00007ffff7f8aa9c in pthread_cancel_init () at unwind-forcedunwind.c:53
#15 0x00007ffff7f8acc4 in _Unwind_ForcedUnwind (exc=0x7ffff55fed70, stop=0x7ffff7f88c10 <unwind_stop>, stop_argument=0x7ffff55be350) at unwind-forcedunwind.c:127
#16 0x00007ffff7f88db5 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#17 0x00007ffff7f7e1c9 in __do_cancel () at pthreadP.h:310
#18 sigcancel_handler (sig=32, si=0x7ffff55bdd30, ctx=<optimized out>) at nptl-init.c:201
#19 sigcancel_handler (sig=<optimized out>, si=0x7ffff55bdd30, ctx=<optimized out>) at nptl-init.c:166
#20 <signal handler called>
#21 __lll_lock_wait_private () at lowlevellock.S:63
#22 0x00007ffff7f7fc37 in start_thread (arg=<optimized out>) at pthread_create.c:462
#23 0x00007ffff7d4cc73 in clone () at clone.S:95

vertexclique commented 4 years ago

I have exactly the same problem. caused by intercepted pthread_rwlock_wrlock.

SUMMARY: ThreadSanitizer: SEGV ??:? in __GI___pthread_rwlock_wrlock

which has exactly the same root cause with this: https://gitlab.freedesktop.org/libnice/libnice/issues/74

Are there any workarounds to not inject to the specific points explicitly? I don't want to stuck with this OpenSSL bug.

aleino-nv commented 4 years ago

May be seeing the same issue in chromium conformance test suite: https://crbug.com/1094869#c42 I'll try to check if similar pthread calls are being made before the crash.

Edit: You may not be able to access that link. Apoligies. There is no super good reason why the issue is not public right now, so perhaps I can make it publicly viewable soon. Anyway, there's not much interesting there besides the observation that I see a SIGSEGV on the same line of code as here, with __interceptor_malloc below on the stack and the same DEADLYSIGNAL printout.

google / sanitizers

Thread sanitizer crash when user does not have sufficiently high rtprio limit #1088