google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.59k stars 1.04k forks source link

TSAN Segmentation fault with timer_create and SIGEV_THREAD #1612

Open janoosthoek opened 1 year ago

janoosthoek commented 1 year ago

Hi all,

this bug was verified with clang 14 stable release and perhaps someone can point me to my failure.

I was lazy and took this code from https://medium.com/vswe/posix-timer-1502348c2f9f because we use this also in our large scale application and needed a sample test app.

#include <time.h>
#include <signal.h>
#include <pthread.h>
#include <sys/syscall.h>
#include <stdio.h>  // printf
#include <assert.h> // assert
#include <unistd.h> // sleep
int expire_count = 0;
pthread_mutex_t mutex; 

pthread_cond_t cond;
void timer_thread(__sigval_t sig) {
  pid_t tid = syscall(__NR_gettid);
  while (1) {
    pthread_mutex_lock(&mutex);
    expire_count++;
    if (expire_count >= 5)
      pthread_cond_signal(&cond);
    printf("timer thread id: %d, count: %d\n", tid, expire_count);
    pthread_mutex_unlock(&mutex);
    sleep(1);
  }
}
int main(int argc, char **argv) {
  pid_t tid = syscall(__NR_gettid);
  printf("main thread id: %d\n", tid);
  timer_t timer_id;
  /* register signal callback */
  struct sigevent sev;
  sev.sigev_notify = SIGEV_THREAD;
  sev.sigev_notify_function = timer_thread;
  /* detached thread, can't be joined */
  sev.sigev_notify_attributes = NULL;
  sev.sigev_value.sival_ptr = &timer_id;
  /* create timer */
  assert(timer_create(CLOCK_MONOTONIC, &sev, &timer_id) == 0);
  /* set time */
  long long freq_nanosecs = 1e9;
  struct itimerspec its;
  its.it_value.tv_sec = freq_nanosecs / 1000000000;
  its.it_value.tv_nsec = freq_nanosecs % 100000000;
  its.it_interval.tv_sec = its.it_value.tv_sec;
  its.it_interval.tv_nsec = its.it_value.tv_nsec;
  /* start timer */
  timer_settime(timer_id, 0, &its, NULL);
  pthread_mutex_lock(&mutex);
  while (expire_count < 5) {
    printf("main thread id: %d cond wait start\n", tid);
    /* it will block and unlock mutex
       to let other thread can get mutex */
    pthread_cond_wait(&cond, &mutex);
    printf("main thread id: %d cond wait end\n", tid);
  }
  pthread_mutex_unlock(&mutex);
  timer_delete(timer_id);
  return 0;
}

to reproduce:

results in:

Thread 2 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6b70f00 (LWP 7548)]
0x00005555555fd351 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >*, unsigned long, unsigned long) ()
(gdb) bt
#0  0x00005555555fd351 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >*, unsigned long, unsigned long) ()
#1  0x00005555555fd041 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
#2  0x00005555555fdba1 in __tsan::user_alloc(__tsan::ThreadState*, unsigned long, unsigned long) ()
#3  0x000055555559fc5d in malloc ()
#4  0x00007ffff7d25b70 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:88
#5  0x00007ffff7d19b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007ffff7daba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Looks like the thread that is spawned will crash upon timer trigger. Any help, greatly appreciated!

Br, Jan

wose commented 10 months ago

I encountered similar segfaults when using getaddrinfo_a.

I was able to reproduce the bug in:

The segfault isn't triggered in:

To reproduce:

#include <arpa/inet.h>
#include <iostream>
#include <netdb.h>
#include <signal.h>
#include <thread>

int main() {
    sigevent sev = { };
    sev.sigev_notify = SIGEV_THREAD;
    sev.sigev_notify_function = [](union sigval sigval) {
        std::cout << "notify fun" << std::endl;
    };

    struct gaicb host = {};
    struct addrinfo hints = {};

    hints.ai_flags = AI_CANONNAME;

    host.ar_name = "example.com";
    host.ar_request = &hints;
    struct gaicb *foo = &host;

    if (getaddrinfo_a(GAI_NOWAIT, &foo, 1, &sev) != 0) {
        throw std::system_error(errno, std::system_category(), "getaddrinfo_a failed");
    }
    std::this_thread::sleep_for(std::chrono::seconds(2));
    std::cout << "done" << std::endl;

    return 0;
}
clang++ -fsanitize=thread test.cpp -lm -lc -lpthread -ldl -lanl
g++ -fsanitize=thread test.cpp -lm -lc -lpthread -ldl -lanl
$ TSAN_OPTIONS="verbosity=3" ./a.out
==4056237==Installed the sigaction for signal 11
==4056237==Installed the sigaction for signal 7
==4056237==Installed the sigaction for signal 8
==4056237==Using llvm-symbolizer found at: /nix/store/4gs7pdssnsc1yvz860wacxinmw4vj8p9-llvm-14.0.6/bin/llvm-symbolizer
***** Running under ThreadSanitizer v3 (pid 4056237) *****
ThreadSanitizer: growing sync allocator: 0 out of 1048576*1024
ThreadSanitizer: growing heap block allocator: 0 out of 262144*4096
Segmentation fault (core dumped)

Stacktrace:

[0] from 0x0000000000494bf8 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool)
[1] from 0x0000000000495274 in __tsan::user_alloc(__tsan::ThreadState*, unsigned long, unsigned long)
[2] from 0x000000000043c946 in malloc
[3] from 0x00007ffff7d3b472 in global_state_allocate
[4] from 0x00007ffff7d0e90c in __libc_allocate_once_slow
[5] from 0x00007ffff7d3bc37 in __nss_database_get
[6] from 0x00007ffff7cf63ec in gaih_inet.constprop
[7] from 0x00007ffff7cf78a6 in getaddrinfo
[8] from 0x00007ffff7d37729 in handle_requests
[9] from 0x00007ffff7c88e86 in start_thread

While I run this test inside a Nix shell, it's not related to Nix and also happens on my Arch system. It also doesn't matter if I use SIGEV_THREAD or a normal signal handler. It seems the thread started internally by getaddrinfo_a is enough to trigger the problem. Running it with GAI_WAIT instead of GAI_NOWAIT didn't make a difference either.

The program exits normally when the thread sanitizer is disabled. I wasn't able to mitigate this by disabling the thread sanitizer for this specific function call.