llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.85k stars 11.47k forks source link

Suboptimal machine code for atomic bool load + test #37753

Open 54aefcd4-c07d-4252-8441-723563c8826f opened 6 years ago

54aefcd4-c07d-4252-8441-723563c8826f commented 6 years ago
Bugzilla Link 38405
Version trunk
OS All
CC @chandlerc,@efriedma-quic,@hfinkel,@RKSimon,@rotateright

Extended Description

The following C++ code compiled with clang -O3 -std=c++17 (https://godbolt.org/g/CAMK9k):

#include <atomic>

std::atomic<bool> flag_atomic{false};
bool flag_nonatomic{false};

extern void f1();
extern void f2();

void branchAtomic() {
    if (flag_atomic.load(std::memory_order_relaxed)) {
        f1();
    } else {
        f2();
    }
}

void branchNonatomic() {
    if (flag_nonatomic) {
        f1();
    } else {
        f2();
    }
}

produces different code for the atomic and non-atomic functions, but in this particular case, it should probably be the same and emit a cmpb instruction on both (right? I am not 100% sure):

branchAtomic(): # @branchAtomic()
  movb flag_atomic(%rip), %al
  testb $1, %al
  jne .LBB0_1
  jmp _Z2f2v # TAILCALL
.LBB0_1:
  jmp _Z2f1v # TAILCALL
branchNonatomic(): # @branchNonatomic()
  cmpb $0, flag_nonatomic(%rip)
  je .LBB1_2
  jmp _Z2f1v # TAILCALL
.LBB1_2:
  jmp _Z2f2v # TAILCALL
flag_atomic:
  .zero 1

flag_nonatomic:
  .byte 0 # 0x0
54aefcd4-c07d-4252-8441-723563c8826f commented 6 years ago

@​davidtgoldblatt spotted this in jemalloc's fast path (https://github.com/jemalloc/jemalloc/pull/1195#issuecomment-385575409) where it appears to produce a slightly measurable performance regression.