google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.33k stars 1.02k forks source link

ThreadSanitizer false positive on spurious compare-exchange write #1520

Open PlasmaPower opened 2 years ago

PlasmaPower commented 2 years ago

Consider the following example C++ program:

#include <atomic>
#include <thread>
#include <iostream>
#include <vector>

int main() {
    std::atomic<int> start;
    std::atomic<int> a;
    std::atomic<int> b;
    start.store(0, std::memory_order_relaxed);
    a.store(0, std::memory_order_relaxed);
    b.store(0, std::memory_order_relaxed);
    std::vector<std::thread> threads;
    auto writer = [&]() {
        while (!start.load(std::memory_order_relaxed)) {}
        int zero = 0;
        a.compare_exchange_strong(zero, 1, std::memory_order_relaxed);
        b.store(1, std::memory_order_release);
    };
    auto reader = [&]() {
        while (!start.load(std::memory_order_relaxed)) {}
        while (!b.load(std::memory_order_acquire)) {}
        auto loaded = *(int*)&a;
        std::cout << "a is " << loaded << std::endl;
    };
    for (size_t i = 0; i < 4; i++) {
        if (i % 2 == 0) {
            threads.push_back(std::thread(reader));
        } else {
            threads.push_back(std::thread(writer));
        }
    }
    start.store(1, std::memory_order_relaxed);
    for (auto& thread : threads) {
        thread.join();
    }
}

Most of it is just setting up for the actual race, which is between the writer:

        int zero = 0;
        a.compare_exchange_strong(zero, 1, std::memory_order_relaxed);
        b.store(1, std::memory_order_release);

and the reader:

        while (!b.load(std::memory_order_acquire)) {}
        auto loaded = *(int*)&a;
        std::cout << "a is " << loaded << std::endl;

TSAN reports a race between the reader and writer:

WARNING: ThreadSanitizer: data race (pid=959480)
  Read of size 4 at 0x7ffd60e2a638 by thread T3:
    #0 main::$_1::operator()() const /home/lee/programming/cpp/race-test/main.cpp:23:17 (main+0xdffc2)
...

  Previous atomic write of size 4 at 0x7ffd60e2a638 by thread T4:
    #0 __tsan_atomic32_compare_exchange_val <null> (main+0xbcb93)
    #1 std::__atomic_base<int>::compare_exchange_strong(int&, int, std::memory_order, std::memory_order) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/11.2.0/../../../../include/c++/11.2.0/bits/atomic_base.h:571:9 (main+0xe011d)
    #2 std::__atomic_base<int>::compare_exchange_strong(int&, int, std::memory_order) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/11.2.0/../../../../include/c++/11.2.0/bits/atomic_base.h:597:9 (main+0xe011d)
    #3 main::$_0::operator()() const /home/lee/programming/cpp/race-test/main.cpp:17:5 (main+0xe011d)
...

However, let's consider the memory order from the reader's perspective. It's loaded b as 1, which means that there must be a previous write of 1 to a in its view of the ordering, because of the release-acquire ordering of b. For instance, here's one possible ordering from the perspective of the reader:

TSAN seems to think that the second compare_exchange_strong is in conflict with the reader's read, as it comes after the release of b. However, compare_exchange_strong explicitly only writes to the value if it matches the expected value, which it cannot in this ordering as 1 was already written to a by a previous writer.

No race should be possible, as from the reader's perspective at the time it reads a, 1 must have already been written to a, which means any future writers' compare_exchange_strong will not write and therefore cannot race with it.

This issue is also reproducible with compare_exchange_weak by replacing the compare_exchange_strong line with:

while (!a.compare_exchange_weak(zero, 1, std::memory_order_relaxed) && zero == 0) {}
dvyukov commented 2 years ago

Interesting. On one hand:

29.6.5/21
...
If the operation returns true, these operations are atomic read-modify-write operations (1.10). Otherwise, these operations are atomic load operations.

On the other hand, I am not sure *(int*)&a is legit C++. Any reason to not use normal atomic load on a here? It should eliminate the tsan report as well.

Though the same should be reproducible with __atomic compiler builtins and the extended model.

PlasmaPower commented 2 years ago

Hmm.. Yeah, I assumed that std::atomic<int> had the same representation as int (aside from potentially increased alignment requirements), but it looks like that isn't defined, probably because the C++ standard is leaving it open for platforms that don't actually support atomics and need a mutex. The reason I use that cast is that my use-case is filling in the result of a sizable computation that might be used frequently, and after the computation is filled in simply reading the struct is simpler and more efficient than using atomic reads for each word (yes I'm aware of alignment concerns).

That said, the issue isn't with the cast itself. I'm guessing that yeah, this would be reproducible with the __atomic builtins. I can write up a version using those if it'd be helpful, but the issue isn't with the non-atomic read alone, as only having one writer thread doesn't cause a TSAN error.

dvyukov commented 2 years ago

I've sent https://reviews.llvm.org/D124507 to fix this. But it's tricky, I am not yet 100% sure it's the right thing to do.

dvyukov commented 2 years ago

Question raised: where does this code come from? @melver

PlasmaPower commented 2 years ago

It's a minimal reproduction inspired by an issue we hit at Arbitrum: https://github.com/OffchainLabs/arbitrum/blob/02edfab7d5842c7390f7f11fa8b3fcb09b5fc21e/packages/arb-avm-cpp/avm_values/src/tuple.cpp#L235

I'm aware the implementation there isn't sound per the C++ standard, as it also casts between a primitive and its atomic variant (though we're busy with another race condition at the moment).

melver commented 2 years ago

From https://reviews.llvm.org/D124507 - I wrote:

[...]
* Mixing atomic and non-atomic ops should be discouraged.
* If someone deliberately does mix plain and atomic ops and runs into this case, I'm assuming it's some one-off complex algorithm and the author could also add no_sanitize("thread").
[...]

As already pointed out, to trigger this false positive requires venturing outside the C++ standard - using the __atomic_ builtins is not within the C++ standard and is implementation-defined (they are builtins provided by GCC and Clang). So from a purely "TSan supports the C++ memory model" point of view, supporting this is going beyond that at a cost that causes other issues.

Now, I also checked your code and I'm assuming it's about memcpyAtomic? https://github.com/OffchainLabs/arbitrum/blob/053c29d29ea224521535828c70855133b1519bc0/packages/arb-avm-cpp/avm_values/src/tuplestub.cpp#L42

What is "atomic" about memcpyAtomic?

dvyukov commented 2 years ago

The builtin description says:

The following built-in functions approximately match the requirements for the C++11 memory model.
...
There are 6 different memory orders that can be specified. These map to the C++11 memory orders with the same names, see the C++11 standard
...
 The description of each memory order is only meant to roughly illustrate the effects and is not a specification; see the C++11 memory model for precise semantics.

https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

So I believe this is just an extension to the C++ standard. C actually contains a non-atomic store in the form of atomic_init, but no atomic load yet. Though, I actually wanted to add it.

PlasmaPower commented 2 years ago

memcpyAtomic is admittedly not a great name for the function. The comment above tries to describe the behavior, but basically it safely allows two writers to concurrently write the same value to the destination safely.

In this use case, multiple threads may write the result of a hash computation to the value. They'll all reach the same result, as it's deterministic. Once they're done, they set an atomic flag saying that the computation is done. Other threads can read that atomic flag, and do a non-atomic read of the struct containing the result of the computation. That read can't conflict with a writer thread because the writer threads will use compare_exchange_strong to ensure that they only write a byte of the result if it was previously unset.

melver commented 2 years ago

The builtin description says:

The following built-in functions approximately match the requirements for the C++11 memory model.
...
There are 6 different memory orders that can be specified. These map to the C++11 memory orders with the same names, see the C++11 standard
...
 The description of each memory order is only meant to roughly illustrate the effects and is not a specification; see the C++11 memory model for precise semantics.

https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

So I believe this is just an extension to the C++ standard. C actually contains a non-atomic store in the form of atomic_init, but no atomic load yet. Though, I actually wanted to add it.

Agreed - though the main thing is about mixing concurrent non-atomic and atomic ops, which the builtins make too easy, but should be outside the normal C++ memory model.

dvyukov commented 2 years ago

Agreed - though the main thing is about mixing concurrent non-atomic and atomic ops, which the builtins make too easy, but should be outside the normal C++ memory model.

I don't agree. I see absence of memory_order_nonatomic only as a language flaw. CPUs have it, GCC has it, LLVM model has it as well: https://llvm.org/docs/Atomics.html#notatomic

thomcc commented 2 years ago

You can mix atomic and non-atomic operations by using std::atomic_ref, which allows you to temporarily perform atomic operations on some variable, so long as all non-atomic accesses are after all atomic_refs have been destroyed. That isn't the case here because there are 4 threads, so perhaps it does not matter.

(Rust allows this too FWIW. The fact that it allows mixing atomic/not in some cases is why Rust describes its atomics as equivalent to atomic_ref under C++20 -- 2nd paragraph here https://doc.rust-lang.org/nightly/std/sync/atomic/index.html).