llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.04k stars 11.98k forks source link

[licm] LICM promote speculative load to scalar #21603

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 21229
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @hfinkel,@john-brawn-arm

Extended Description

Here is a simple c code:

LLVM takes a conservative approach and does not promote speculative loads to scala in licm pass because it would break the LLVM concurrency model. While this is true because it could introduce a race with multiple threads, but there are still some cases where it is OK to promote speculative loads without causing a race condition such as when the loop entry is guarded by a condition and if we can prove that there is atleast one load.

Consider this example:

extern int globalvar; void foo(int n , int incr) { unsigned int i; for (i = 0 ; i < n; i += incr ) { if (i < n/2) globalvar += incr; } return; }

GCC produces following output:

GCC output:

$ aarch64-linux-gnu-g++ -S -o - -O3 -ffast-math -march=armv8-a+simd test.cpp .arch armv8-a+fp+simd .file "test.cpp" .text .align 2 .global _Z3fooii .type _Z3fooii, %function _Z3fooii: .LFB0: .cfi_startproc cbz w0, .L1 adrp x6, globalvar add w5, w0, w0, lsr 31 ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar mov w2, 0 asr w5, w5, 1 .L4: cmp w5, w2 add w2, w2, w1 add w4, w3, w1 csel w3, w4, w3, hi cmp w2, w0 bcc .L4 str w3, [x6,#:lo12:globalvar] <== sink store of globalvar .L1: ret .cfi_endproc .LFE0: .size _Z3fooii, .-_Z3fooii .ident "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) 4.9.0"

whereas LLVM produces following output:

$ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp .text .file "test.cpp" .globl _Z3fooii .align 2 .type _Z3fooii,@function _Z3fooii: // @​_Z3fooii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #​0 // =0 cinc w9, w0, lt asr w9, w9, #​1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1 ldr w11, [x10, :lo12:globalvar] <===== load inside loop add w11, w11, w1 str w11, [x10, :lo12:globalvar] <==== store inside loop .LBB0_4: // %for.inc // in Loop: Header=BB0_2 Depth=1 add w8, w8, w1 cmp w8, w0 b.lo .LBB0_2 .LBB0_5: // %for.end ret .Ltmp1: .size _Z3fooii, .Ltmp1-_Z3fooii

    .ident  "clang version 3.6.0 "

LLVM misses this opportunity by being too conservative

This was discussed briefly in the llvm-dev mailing list here: http://article.gmane.org/gmane.comp.compilers.llvm.devel/76467

john-brawn-arm commented 5 years ago

clang 8.0.0 and latest trunk still fail to perform this optimization.