llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.54k stars 11.8k forks source link

False thread-sanitizer positive when using OpenMP (archer) with C++11 thread-safe statics on macOS #53067

Open q-p opened 2 years ago

q-p commented 2 years ago

I'm trying to apply thread-sanitizer (tsan) to an OpenMP parallelized C++11 program, and it seems like I get a race report which (AFAIK) isn't actually a real race on macOS 12 "Monterey".

This is using Homebrew clang 13

Homebrew clang version 13.0.0
Target: x86_64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm/bin

on macOS 12.1 "Monterey" on an Intel Mac Pro (2019).

The following program

#include <vector>
#include <iostream>

int main (int argc, char const *argv[])
{
  #pragma omp parallel
  {
    static const std::vector<int> vec(1, 42);
    std::cout << vec.size() << std::endl;
  }
  return 0;
}

compiled via /usr/local/opt/llvm/bin/clang++ -L /usr/local/opt/llvm/lib -fopenmp -fsanitize=thread -std=c++11 omp_threadsafe_init_archer.cpp (the -L path is given so the linking step picks up the OpenMP run-time library) and then run via

OMP_TOOL_LIBRARIES=~/Downloads/openmp-13.0.0.src/tools/archer/libarcher.dylib OMP_NUM_THREADS=2 MallocNanoZone=0 TSAN_OPTIONS="ignore_noninstrumented_modules=1" ./a.out

leads to the following output and TSAN report

1
==================
WARNING: ThreadSanitizer: data race (pid=51403)
  Read of size 8 at 0x000109889138 by thread T1:
    #0 std::__1::vector<int, std::__1::allocator<int> >::size() const <null> (a.out:x86_64+0x100001f4d)
    #1 .omp_outlined. <null> (a.out:x86_64+0x100001ddc)
    #2 __kmp_invoke_microtask <null> (libomp.dylib:x86_64+0x7a852)
    #3 main <null> (a.out:x86_64+0x100001d03)

  Previous write of size 8 at 0x000109889138 by main thread:
    #0 std::__1::__vector_base<int, std::__1::allocator<int> >::__vector_base() <null> (a.out:x86_64+0x1000021ab)
    #1 std::__1::vector<int, std::__1::allocator<int> >::vector(unsigned long, int const&) <null> (a.out:x86_64+0x1000020e1)
    #2 std::__1::vector<int, std::__1::allocator<int> >::vector(unsigned long, int const&) <null> (a.out:x86_64+0x100001e95)
    #3 .omp_outlined. <null> (a.out:x86_64+0x100001da5)
    #4 __kmp_invoke_microtask <null> (libomp.dylib:x86_64+0x7a852)
    #5 main <null> (a.out:x86_64+0x100001d03)

  Location is global 'main::vec' of size 24 at 0x000109889130 (a.out+0x000100008138)

  Thread T1 (tid=5238101, running) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x9fdf)
    #1 __kmp_create_worker <null> (libomp.dylib:x86_64+0x60daa)

From what I understand (and have observed in real code) the C++11 static initialization is automatically thread-safe, so that the subsequent access from the 2nd thread should not race (the access from whichever thread initialized it must already have happened as the size is already printed).

llvmbot commented 2 years ago

@llvm/issue-subscribers-openmp

jprotze commented 2 years ago

I cannot reproduce this issue on x86 Linux. As I understand the symbols in your stacktrace, you should be using the LLVM libc++ runtime. Anyway, I tried both libc++ and libstdc++ from gcc-11 with no report.

I even disabled the use of Archer and got no report:

$ TSAN_OPTIONS='ignore_noninstrumented_modules=1' ARCHER_OPTIONS="enable=0 verbose=1" OMP_NUM_THREADS=3 ./a.out
Archer disabled, stopping operation
11
1

$

My suspicion is that this is a general (not OpenMP-specific) issue on macOS, please try the equivalent C++ thread code:

#include <vector>
#include <thread>
#include <iostream>

void tfunc(){
    static const std::vector<int> vec(1, 42);
    std::cout << vec.size() << std::endl;
}

int main (int argc, char const *argv[])
{
    std::thread t1{tfunc}, t2{tfunc};
    t1.join();
    t2.join();
    return 0;
}
q-p commented 2 years ago

you should be using the LLVM libc++ runtime

Correct, macOS is shipping libc++.

(updated after I must've been testing something incorrectly)

The example you provided using just std::thread works correctly with tsan, no race is reported.

Only the OpenMP case reports the race (both without and with archer).

q-p commented 2 years ago

Sorry, my previous "OpenMP isn't involved" agreement was too hasty, the std::thread one reports no race.

(But I couldn't reproduce it on Linux either, so it is probably a system specific interaction. The macOS binary seems to use ___cxa_guard_acquire and ___cxa_guard_acquire to secure the static initialization.

jprotze commented 2 years ago

@dvyukov @kcc do you have any idea, why static initialization on macOS is reported as a race?

q-p commented 2 years ago

Just to clear up my previously incorrect answer, the following works fine

> cat bug_std_thread.cpp 
#include <vector>
#include <thread>
#include <iostream>

void tfunc(){
    static const std::vector<int> vec(1, 42);
    std::cout << vec.size() << std::endl;
}

int main (int argc, char const *argv[])
{
    std::thread t1{tfunc}, t2{tfunc};
    t1.join();
    t2.join();
    return 0;
}
> clang++ -fsanitize=thread -std=c++11 bug_std_thread.cpp
> ./a.out
a.out(45834,0x10e311600) malloc: nano zone abandoned due to inability to preallocate reserved vm space.
1
1