llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.77k stars 11.44k forks source link

[optimization] C++11 std::atomic<>.fetch_or() could be optimized to use 'bts' #14047

Open llvmbot opened 12 years ago

llvmbot commented 12 years ago
Bugzilla Link 13675
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor

Extended Description

$ clang --version clang version 3.2 (trunk) (04622b05755fb9dbb062735a53e779e1deb29a97) Target: x86_64-pc-linux-gnu Thread model: posix

The minimal test case:

include

int main() { std::atomic_int foo;

return foo.fetch_or(1);

}

Results in the following assembly (-O3 -S):

BB#0:

movl    $1, %ecx

.LBB0_1: # =>This Inner Loop Header: Depth=1 movl -8(%rsp), %eax movl %eax, %edx orl %ecx, %edx lock cmpxchgl %edx, -8(%rsp) jne .LBB0_1

BB#2:

ret

I believe that such an operation (atomic bit-setting) could be done much easier and faster using the 'bts' (bit test & set) mnemonic.

I'm not sure if this can be enhanced in clang, llvm or whether it needs changes to libstdc++.

llvmbot commented 7 years ago

fetch_or has to return the old value, including all the bits, not just bit 0.

If whole-program optimization could prove that an atomic_int could only ever be 0 or 1, then yes, lock bts $0, (mem) / setc %al would probably be more efficient. Memory-destination lock bts is not too slow with an immediate bit index.

But if you want that, you should use atomic_flag.


lock bts would be usable as a peephole for

int old = foo.fetch_or(SINGLE_BIT);
return old & SINGLE_BIT;

especially in a boolean context, otherwise you'd have to setc and shift or something to create the correct mask.

llvmbot commented 10 months ago

@llvm/issue-subscribers-c-11

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [13675](https://llvm.org/bz13675) | | Version | trunk | | OS | Linux | | Reporter | LLVM Bugzilla Contributor | ## Extended Description $ clang --version clang version 3.2 (trunk) (04622b05755fb9dbb062735a53e779e1deb29a97) Target: x86_64-pc-linux-gnu Thread model: posix The minimal test case: #include <atomic> int main() { std::atomic_int foo; return foo.fetch_or(1); } Results in the following assembly (-O3 -S): # BB#0: movl $1, %ecx .LBB0_1: # =>This Inner Loop Header: Depth=1 movl -8(%rsp), %eax movl %eax, %edx orl %ecx, %edx lock cmpxchgl %edx, -8(%rsp) jne .LBB0_1 # BB#2: ret I believe that such an operation (atomic bit-setting) could be done much easier and faster using the 'bts' (bit test & set) mnemonic. I'm not sure if this can be enhanced in clang, llvm or whether it needs changes to libstdc++.