Closed hidva closed 2 years ago
another example:
#include <atomic>
void f(std::atomic_uint64_t* p) {
p->fetch_add(0, std::memory_order_relaxed);
}
$ clang -O1 -mllvm -opt-bisect-limit=40 -c -o 2.o 2.cc
$ objdump -M intel -d 2.o
0000000000000000 <_Z1fPSt6atomicImE>:
0: 0f ae f0 mfence
3: 48 8b 07 mov rax,QWORD PTR [rdi]
6: c3 ret
$ clang -O1 -mllvm -opt-bisect-limit=41 -c -o 2.o 2.cc
...
BISECT: running pass (40) Simplify the CFG on function (_Z1fPSt6atomicImE)
BISECT: running pass (41) Combine redundant instructions on function (_Z1fPSt6atomicImE)
BISECT: NOT running pass (42) Simplify the CFG on function (_Z1fPSt6atomicImE)
...
$ objdump -M intel -d 2.o
0000000000000000 <_Z1fPSt6atomicImE>:
0: 48 8b 07 mov rax,QWORD PTR [rdi]
3: c3 ret
I suspect it's NOTABUG.
Could you please be more specific as to what is incorrect?
x + 0
and x | 0
is the same thing, and gcc produces the former:
https://godbolt.org/z/jz46GKds9
I think it should output lock or dword ptr [rdi], 0
, not lock or dword ptr [rsp - 64], 0
@llvm/issue-subscribers-backend-x86
The OR is writing to a location on the stack, not the original variable. It's just trying to create a memory ordering barrier. Pretty sure it's hitting this code in lowerAtomicArith
from X86ISelLowering.cpp
// Specialized lowering for the canonical form of an idemptotent atomicrmw.
// The core idea here is that since the memory location isn't actually
// changing, all we need is a lowering for the *ordering* impacts of the
// atomicrmw. As such, we can chose a different operation and memory
// location to minimize impact on other code.
if (Opc == ISD::ATOMIC_LOAD_OR && isNullConstant(RHS)) {
// On X86, the only ordering which actually requires an instruction is
// seq_cst which isn't SingleThread, everything just needs to be preserved
// during codegen and then dropped. Note that we expect (but don't assume),
// that orderings other than seq_cst and acq_rel have been canonicalized to
// a store or load.
if (AN->getSuccessOrdering() == AtomicOrdering::SequentiallyConsistent &&
AN->getSyncScopeID() == SyncScope::System) {
// Prefer a locked operation against a stack location to minimize cache
// traffic. This assumes that stack locations are very likely to be
// accessed only by the owning thread.
SDValue NewChain = emitLockedStackOp(DAG, Subtarget, Chain, DL);
assert(!N->hasAnyUseOfValue(0));
// NOTE: The getUNDEF is needed to give something for the unused result 0.
return DAG.getNode(ISD::MERGE_VALUES, DL, N->getVTList(),
DAG.getUNDEF(VT), NewChain);
}
Thanks
Incorrect code generation: