llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.84k stars 11.47k forks source link

improve hot path code through inter procedural alias analysis in cold path #29246

Open weiguozhi opened 8 years ago

weiguozhi commented 8 years ago
Bugzilla Link 28876
Version trunk
OS Linux
CC @hfinkel,@nemanjai,@River707

Extended Description

The source code is

include

void foo(int iters, int size) { std::vector fv; for (int i = 0; i < iters; ++i) { fv.clear(); for (int j = 0; j < size; ++j) { fv.push_back(j); } } }

When compiled into powerpc instructions with options -m64 -O2 -mvsx -mcpu=power8 -fno-exceptions

LLVM generates following code for the loops:

.LBB0_2: # %for.cond.cleanup3.for.body_crit_edge

in Loop: Header=BB0_3 Depth=1

    ld 4, 112(31)

.LBB0_3: # %for.body

=>This Loop Header: Depth=1

                                    #     Child Loop BB0_6 Depth 2
    std 4, 120(31)
    stw 26, 108(31)
    bc 4, 9, .LBB0_10

BB#4: # %for.body4.preheader

                                    #   in Loop: Header=BB0_3 Depth=1
    li 3, 0
    b .LBB0_6
    .p2align        4

.LBB0_5: # %_ZNSt6vectorIiSaIiEE9push_backERKi.exit.for.body4_crit_edge

in Loop: Header=BB0_6 Depth=2

    ld 4, 120(31)

.LBB0_6: # %for.body4

Parent Loop BB0_3 Depth=1

                                    # =>  This Inner Loop Header: Depth=2
    ld 5, 128(31)
    cmpld    4, 5
    beq      0, .LBB0_8

BB#7: # %if.then.i

                                    #   in Loop: Header=BB0_6 Depth=2
    addi 5, 4, 4                    // update pointer of available space
    stw 3, 0(4)                     // push_back j
    std 5, 120(31)
    b .LBB0_9
    .p2align        4

.LBB0_8: # %if.else.i

in Loop: Header=BB0_6 Depth=2

    mr 3, 28
    mr 5, 27
    bl _ZNSt6vectorIiSaIiEE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPiS1_EERKi
    nop

.LBB0_9: # %_ZNSt6vectorIiSaIiEE9push_backERKi.exit

in Loop: Header=BB0_6 Depth=2

    lwz 3, 108(31)
    addi 3, 3, 1                    // j++
    stw 3, 108(31)
    cmpw     3, 30
    blt      0, .LBB0_5

.LBB0_10: # %for.cond.cleanup3

in Loop: Header=BB0_3 Depth=1

    addi 25, 25, 1
    cmplw    25, 29
    bne      0, .LBB0_2

GCC generates following code for the hot path of the loops, there are more code in the cold path not shown here:

.L2: ble 3,.L4 mr 31,28 li 23,0 .p2align 4,,15 .L14: cmpld 7,31,10 beq 7,.L5 cmpdi 7,31,0 beq 7,.L6 stw 23,0(31) // push_back j .L6: addi 31,31,4 // update the pointer .L7: addi 9,23,1 // j++ cmpw 7,9,29 extsw 23,9 bne 7,.L14 .L4: addi 8,24,1 cmpw 7,8,27 extsw 24,8 bne 7,.L2

Both compilers inlined function push_back, but gcc inlined all sub functions called by push_back, so it can do more complete alias analysis, and figure out variable j does not alias with fields of fv, so all variables are allocated to registers.

On the other hand, llvm didn't inline the function _M_insert_aux, it is reasonable since it is in cold path only. But it needs pointers of both j and fv, so it may cause j and fv to be alias with each other. Then it impacts the accesses of j and fv in the hot path, and generates many memory access instructions in the hot path. It slows down this function by 5.9x on power8.

One possible improvement is through inter procedural alias analysis, we can figure out that j is not aliased with fields of fv in function _M_insert_aux even without inlining it, so we can allocate them in registers in the hot path, only store and load them around the function call to satisfy the parameter requirement of _M_insert_aux.

River707 commented 8 years ago

LLVM has the ability to tag with callsite parameters with NoAlias(The attribute for not aliasing any other address) but I don't know of any current analysis passes that tag anything other then struct return parameters with it.