llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.12k stars 11.61k forks source link

`is_always_lock_free` Returns Incorrect Value For 128 Bit Struct On ARM64 #75081

Open insertinterestingnamehere opened 9 months ago

insertinterestingnamehere commented 9 months ago

With clang 17.1 on ARM64 I'm seeing std::atomic<T>::is_always_lock_free evaluate to true even though the member function std::atomic<T>::is_lock_free() evaluates to false for the same type.

Here's a minimal example:

#include <atomic>
#include <cstdint>
#include <iostream>

struct bigint_equiv {
  uint64_t first, second;
};

int main() {
  std::atomic<bigint_equiv> a;
  std::cout << a.is_lock_free() << " " << decltype(a)::is_always_lock_free << std::endl;
}

Possibly related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 where the gcc devs discuss why libatomic doesn't allow 128 bit atomics to be lock-free on ARM.

When I compile the same code with gcc 13.2 is_always_lock_free is false which is strange since I've confirmed that the same libstdc++ is being included and linked. This also appears to be what happens on godbolt too though. With gcc the assembly (as best I can tell) shows a 0 being written out (https://godbolt.org/z/T7493K8xP) while with clang the assembly shows a 1 (https://godbolt.org/z/Gb35jMfjr).

insertinterestingnamehere commented 9 months ago

It looks like libstdc++ falls back to the compiler built-in __atomic_always_lock_free, so maybe the bug is in clang's implementation of that built-in function. Alternatively, if clang is actually using 128 bit atomics on ARM, std::atomic<T>::is_lock_free() may just be misreporting that. I'm not sure.

llvmbot commented 9 months ago

@llvm/issue-subscribers-clang-frontend

Author: Ian Henriksen (insertinterestingnamehere)

With clang 17.1 on ARM64 I'm seeing `std::atomic<T>::is_always_lock_free` evaluate to `true` even though the member function `std::atomic<T>::is_lock_free()` evaluates to `false` for the same type. Here's a minimal example: ```C++ #include <atomic> #include <cstdint> #include <iostream> struct bigint_equiv { uint64_t first, second; }; int main() { std::atomic<bigint_equiv> a; std::cout << a.is_lock_free() << " " << decltype(a)::is_always_lock_free << std::endl; } ``` Possibly related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 where the gcc devs discuss why libatomic doesn't allow 128 bit atomics to be lock-free on ARM. When I compile the same code with gcc 13.2 `is_always_lock_free` is `false` which is strange since I've confirmed that the same libstdc++ is being included and linked. This also appears to be what happens on godbolt too though. With gcc the assembly (as best I can tell) shows a 0 being written out (https://godbolt.org/z/T7493K8xP) while with clang the assembly shows a 1 (https://godbolt.org/z/Gb35jMfjr).
jyknight commented 9 months ago

There's a whole chain of issues here:

  1. Clang does indeed emit native 128-bit atomic instructions on ARM64, despite that it turns a load into a compare-and-swap.
  2. You're using Clang compiler with GCC's libatomic runtime library, I presume. That is intended to work properly, but, here it does not, because the latter does not use 128-bit atomics on ARM64 (and is arguably correct in not doing so).
  3. libstdc++'s std::atomic's implementation of is_lock_free() calls the compiler builtin __atomic_is_lock_free, which should be evaluated at compile-time when the compiler knows the answer. But, libstdc++ uses return __atomic_is_lock_free(sizeof(_M_i), reinterpret_cast<void *>(-_S_alignment));, and Clang doesn't handle known integer values, and thus ends up making a runtime call into libatomic (which should only ever return true in more situations than the compiler, but, due to the above, does not).

So, firstly, we ought to add code to handle constant integer values of a pointer in __atomic_is_lock_free -- in which case we we'd evaluate it to true in the compiler, instead of making a runtime call. That fixes the surface symptom -- is_lock_free() will start returning true here.

But there's still an incompatibility, because the underlying assumption is that if the compiler ever uses inline "lock-free" atomics for a particular pointer/size, libatomic must also do so. Without that guarantee, libatomic may use a spinlock, which is not actually atomic with regards to native atomic instructions. So, secondly, we ought to stop emitting inline 128-bit atomics inline unless LSE2 is available (LSE providees actual atomic 128-bit load/store instructions). But, when we do this, we probably need to put it behind an option, and continue to use it in libatomic, or else Clang will be incompatible with older versions of itself.

@Logikable FYI.

insertinterestingnamehere commented 9 months ago

Interesting! Thanks for clarifying what's going on.

I also just tested this on x86_64 and, surprisingly, this issue isn't unique to ARM. Compiling with -march=x86-64-v2 or really any similarly recent revision of x86-64 shows the same result, but the original -march=x86-64 does not.