Suboptimal machine code for atomic bool load + test


Bugzilla Link	38405
Version	trunk
OS	All
CC	@chandlerc,@efriedma-quic,@hfinkel,@RKSimon,@rotateright

Extended Description

The following C++ code compiled with clang -O3 -std=c++17 (https://godbolt.org/g/CAMK9k):

#include <atomic>

std::atomic<bool> flag_atomic{false};
bool flag_nonatomic{false};

extern void f1();
extern void f2();

void branchAtomic() {
    if (flag_atomic.load(std::memory_order_relaxed)) {
        f1();
    } else {
        f2();
    }
}

void branchNonatomic() {
    if (flag_nonatomic) {
        f1();
    } else {
        f2();
    }
}

produces different code for the atomic and non-atomic functions, but in this particular case, it should probably be the same and emit a cmpb instruction on both (right? I am not 100% sure):

branchAtomic(): # @branchAtomic()
  movb flag_atomic(%rip), %al
  testb $1, %al
  jne .LBB0_1
  jmp _Z2f2v # TAILCALL
.LBB0_1:
  jmp _Z2f1v # TAILCALL
branchNonatomic(): # @branchNonatomic()
  cmpb $0, flag_nonatomic(%rip)
  je .LBB1_2
  jmp _Z2f1v # TAILCALL
.LBB1_2:
  jmp _Z2f2v # TAILCALL
flag_atomic:
  .zero 1

flag_nonatomic:
  .byte 0 # 0x0

llvm / llvm-project

Suboptimal machine code for atomic bool load + test #37753

Extended Description