llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.63k stars 11.83k forks source link

LLVM lowers __atomic builtin to a function call when lowering __sync builtin to a single instruction with LOCK prefix #38194

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 38846
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @zygoloid

Extended Description

=== Test program ===

include

template struct attribute((packed)) Foo { T a; void incAtomic() { atomic_add_fetch(&a, 1, ATOMIC_RELEASE); } void incSync() {

    __sync_add_and_fetch(&a, 1);
}

};

int main(int argc, char** argv) { Foo f; f.a = argc; f.incAtomic(); f.incSync(); return f.a; }

=== Disassembled instructions === (compiled with clang++ --std=c++17 -O3 -Wall -DNDEBUG=1)

main: push rax mov word ptr [rsp], di mov rdi, rsp mov esi, 1 mov edx, 3 call __atomic_fetch_add_2 lock add word ptr [rsp], 1 # __sync_add_and_fetch movzx eax, word ptr [rsp] pop rcx ret

GCC's documentation for builtins (https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/_005f_005fatomic-Builtins.html) says that atomic builtsins are intended to replace sync builtins, so there should be no semantic difference between corresponding ones.

I believe the difference comes from the alignment checking logic (https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGAtomic.cpp#L768) only included in handling __atomic builtins. However, considering that LOCK prefix works for unaligned accesses as well, I'm not sure if that alignment check is necessary. LOCK for unaligned memory access is expensive, but not sure if it is more expensive than library calls.

Kojoley commented 3 years ago

Bug llvm/llvm-project#25709 has been marked as a duplicate of this bug.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

As of r341734, clang no longer calls the __builtinfoo lib functions for misaligned pointers.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

llvm/llvm-bugzilla-archive#38871 filed for the bug that libc++'s std::atomic is not always atomic when compiling with GCC. (It's a GCC bug but libc++ needs to work around it.)

llvmbot commented 6 years ago

Got it. Thank you for the detailed explanation!

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

I meant std::atomic. (I guess there's no "standard" for builtins?)

There is no alignment requirement in any specification for std::atomic; rather, the C++ standard requires it to work for an arbitrary trivially-copyable type, and implementations make that efficient by increasing the alignment where necessary.

In libc++ this is done by implementing it in terms of _Atomic (where available); Clang's implementation of _Atomic increases the alignment as necessary. (The alignment of _Atomic(T) should be specified by the psABI for the target, but in practice it is not.) However, when compiled with GCC, it just uses _atomic and doesn't increase the alignment, presumably resulting in miscompiles in the cases where GCC's _atomic functions fail to actually be atomic.

In libstdc++, the GCC bug gcc.gnu.org/#87237 is worked around by increasing the alignment on std::atomic when sizeof(T) is a power of 2 <= 16 and alignof(T) is less than sizeof(T):

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/atomic#L189

llvmbot commented 6 years ago

I meant std::atomic. (I guess there's no "standard" for builtins?)

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described?

For which form of atomics? (_sync or _atomic or _Atomic or std::atomic?)

llvmbot commented 6 years ago

I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described? I tried to find it by myself but couldn't find one.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).

Generally, when you pass a T* to a function, that function is free to assume that the pointer value is suitably aligned for type T. As a result, you must be extremely careful when using pointers that point to members of a struct declared with attribute((packed)).

If you're using atomic operations on a packed struct as a whole, that should work fine. Both _Atomic(T) and std::atomic will increase the alignment of the atomic type as necessary to make natural atomic operations work on the object, though, so atomic will typically have an alignment greater than 1.

There's implementation divergence in your example: GCC ignores the attribute((packed)) on the member of type atomic in Foo2 (and gives you a working program that performs a correct, naturally-aligned atomic access). Clang respects the request, which means you're making a member function call with a misaligned this pointer, which is UB.

You might also be interested in this greatly reduced example of what happens if you try to use __atomic ops on pointers that are not naturally aligned: https://gcc.godbolt.org/z/LqXKrt

Note that GCC generates a plain 'mov' for this 1-byte-aligned 4 byte load. That's not atomic in general. Clang generates a call to __atomic_load_4, which in both compiler-rt and libatomic will result in a plain 'mov'.

llvmbot commented 6 years ago

Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

I think your example has undefined behavior in both the sync and atomic forms. You're calling the versions of these builtins that take an unsigned short*, and passing a pointer that is not suitably aligned. This is UB, just as it would be if you called a function that performed a non-atomic load on such a pointer.

However, I think that's largely irrelevant for the purpose of this bug, and we usually do try to guarantee that cases like this work, when the alignment can be determined from the form of the expression. (I think the cause of the difference between the _sync builtins and the _atomic builtins is that the former make no attempt to make such cases work, whereas the latter do try to make this work.)

The same behavior can be seen for cases whose behavior is supposed to be defined, such as:

struct A { char c[4]; } p, q; void f() { __atomic_load(p, q, 5); } // calls __atomic_load_4

struct B { int n; } r, s; void g() { __atomic_load(r, s, 5); } // emitted as a load

... so it looks like we use a libcall for unaligned accesses for atomic_, and do not try to support unaligned accesses for _sync. (Note that the sync_ functions only take built-in integral and pointer types, for which alignment is implied, whereas the _atomic functions take more arbitrary types.)

Now, it's wrong to call atomic_load_4 in the above example: the "optimized" library functions are only valid to call on properly-aligned pointers. We should be calling the generic atomic_load library function instead.

Also, it's worth noting that choosing to use inlined code rather than a libcall is an ABI decision, not merely a performance choice: if some part of the program uses inlined code, the libcall implementation must not perform nonatomic accesses under a lock. It's always correct to conservatively use a libcall, but it's not correct to inline code if the libatomic implementation would use a lock for that object.

I would note that GCC generates a call to atomic_load_16 for a 1-byte-aligned 16 byte load -- https://godbolt.org/z/ScqQB9 -- so we're not the only ones to get this wrong. Also, GCC's libatomic implementation of atomic_load will use a lock to load an object of type 'struct A' above if it straddles a 16-byte-alignment boundary, so GCC + GNU libatomic gets the f() case wrong too by emitting a 'mov' rather than a libcall.

One final problem is that the compiler-rt implementation of __atomic_load does not even depend on the alignment of the pointer: when given a size for which a naturally-aligned load would be atomic, it just calls __c11_atomic_load, which will in general not be atomic when given an unaligned pointer. (GCC's libatomic gets this right, as far as I can see.)

I would conclude:

(and the above seems to hold across both Clang and GCC).

The best advice I can give you is to avoid misaligned atomic operations.