Non-constant memory order in atomic built-ins

dkolsen-pgi commented 3 months ago

RFE: Support atomic built-ins with memory order arguments that are runtime values rather than compile-time values.

When the compiler sees result = __atomic_fetch_add(ptr, val, order); where the value of order is not a constant, then CodeGen must generate CIR similar to this:

switch (order) {
default:
  result = __atomic_fetch_add(ptr, val, __ATOMIC_RELAXED);
  break;
case __ATOMIC_CONSUME:
case __ATOMIC_ACQUIRE:
  result = __atomic_fetch_add(ptr, val, __ATOMIC_ACQUIRE);
  break;
case __ATOMIC_RELEASE:
  result = __atomic_fetch_add(ptr, val, __ATOMIC_RELEASE);
  break;
case __ATOMIC_ACQ_REL:
  result = __atomic_fetch_add(ptr, val, __ATOMIC_ACQ_REL);
  break;
case __ATOMIC_SEQ_CST:
  result = __atomic_fetch_add(ptr, val, __ATOMIC_SEQ_CST);
  break;
}

While this seems wasteful an unnecessary at first, it is needed to support std::atomic, which is an extremely important use case. Because std::atomic puts an extra function in between the user and the atomic built-in, the memory order looks like a runtime value even though it is a constant in the source code. For example, the user might write

a.fetch_add(20, std::memory_order_acq_rel);

While the memory order is a constant in the user code, inside the definition of std::atomic<T>::fetch_add the memory order is just a function parameter whose value isn't known at compile-time, at least during CodeGen. After inlining and constant folding optimizations have happened, the switch statement can be optimized away. But it needs to exist during CodeGen and will survive all the way through the compilation if no optimization happens.

bcardosolopes commented 3 months ago

After inlining and constant folding optimizations have happened, the switch statement can be optimized away

Yea, I've seen many cases internally where even after the opt pipeline does its job we might still have the indirection layer.

Regarding the overall approach: it might be worth trying to change existing atomic operations to accept a "value or mem_order" (we have something similar with calls, for supporting both indirect/direct within the same op, LLVMIR dialect also has examples) and only break this into a switch in LoweringPrepare - this leave some room for CIR passes being able to constant propagate / idiom recognize before we expand it, and then future fold methods could be able to change to the "mem_order" version, avoiding the early expansion.

dkolsen-pgi commented 3 months ago

it might be worth trying to change existing atomic operations to accept a "value or mem_order"

I already have it implemented doing the expansion during CodeGen. I am cleaning up the change and getting ready to commit it. I don't want to re-implement it right now based on speculation that it might be better in the long run. I do like the idea and it has merit, but I think we should revisit it when the ClangIR optimizations and transformations are further along.

bcardosolopes commented 3 months ago

I think it has merit even if we don't consider the opt speculation, because it keeps CIR simple to read/reason at the first level out of CIRGen, but at this point I do care more about getting this down to LLVM and the incremental approach sounds good - once you put the PR up I'll file another issue to track this type of extra abstraction (we might find volunteers to do tackle it).

llvm / clangir

Non-constant memory order in atomic built-ins #731