Open ConorWilliams opened 4 months ago
Some 3rd party benchmarks here
cc @RKSimon @topperc
@llvm/issue-subscribers-backend-x86
Author: Conor Williams (ConorWilliams)
Funnily enough, if the cpu target doesn't have MFENCE LowerATOMIC_FENCE will emit LOCK OR $(RSP), 0
There's a patch here that never merged https://reviews.llvm.org/D129947
There's a patch here that never merged https://reviews.llvm.org/D129947
@topperc could it still be merged? I'm not sure I follow why it got stalled.
While implementing a lock-free-queue I noticed that the
pop
function was about twice as slow on clang vs gcc. After digging through the assembly on compiler explorer and then reducing to a minimal example it seems that this is happening:std::atomic_thread_fence(std::memory_order_seq_cst)
|lock or QWORD PTR [rsp], 0
|mfence
The
mfence
instruction is much slower, MSVC also generateslock inc DWORD PTR __Guard$1[esp+4]
instead of anmfence
. I raised this on the r/cpp a while ago and was referred to this GCC patch which introduced the optimisation. How can we go about getting something like this into llvm? I have been using boost atomic which seems to generate better assembly but, it would be really nice to drop the dependency.