Open llvmbot opened 13 years ago
loop-rotate has learned to rotate loops with multiple exits in r162912. Looks like LLVM still produces suboptimal code for your test case though.
Does anyone plan to address this issue? The loop in Func_2 of dhrystone also has multiple exit blocks and LLVM's fails to rotate it. The resulting code is significantly slower than that of other compilers.
Extended Description
Attached is a test case that illustrates how GCC generates dramatically superior code for a loop for ARM. LLVM could do something comparable if the LoopRotation pass would not reject loops that have more than one exit. If LoopRotation would properly handle such loops, then induction-variable analysis and LICM would have more opportunities to clean things up (as is apparently done in GCC).
Compiled with -O3, the LLVM generates the following loop code of 10 instructions:
.LBB0_1: @ %while.cond.i @ =>This Inner Loop Header: Depth=1 add r1, r0, #5 cmp r1, #1 mov r1, #0 blt .LBB0_3 ldr r2, [r4, -r0, lsl #2] ldr r3, [r5, -r0, lsl #2] sub r0, r0, #1 mov r1, #1 tst r3, r2 beq .LBB0_1
By contrast GCC generates the following 6-instruction loop:
.L3: ldr r0, [r3], #4 ldr r1, [r2, #4]! tst r0, r1 bne .L4 cmp r3, ip bne .L3