llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.71k stars 10.94k forks source link

suboptimal code generation in loop unrolling #11568

Open llvmbot opened 12 years ago

llvmbot commented 12 years ago
Bugzilla Link 11196
Version trunk
OS All
Attachments test case
Reporter LLVM Bugzilla Contributor
CC @atrick,@nlewycky

Extended Description

I attached a llvm-ir translation of the following program: static init begin if(copy('hello', 1, 2) == 'el') { log('correct') } end

This program should be completely constant folded, but the optimization stucks at a loop: the loop begins at line 51 and compares two strings of the same size. As you see in line 61+63 and 62+64, the characters of the strings at this specific position are loaded and this exactly is the problem.

It is about the same issue like http://llvm.org/bugs/show_bug.cgi?id=11142 but this time, it's a bit more complex: the memory is not directly copied, but it's received by a getelementptr+load from the malloced memory. The challenge is to proof that the loop will stay exactly in the copied range of [3 x i8]* @​label416 so that the getelementptr call can fetched from @​label416 instead of %result.i.i10

An other question is why the loop that only consists of two iterations is not constant folded. The loop has the following structure:

start: br label %test test: %loopvar = phi i32 [%loopvar2, %body], [0, %start] %finished = icmp sgt i32 %loopvar, 2 ; or any other constant or non-constant br i1 %finished, label %continue, label %body body: ; sth fancy %otherbreak = icmp ..... br i1 %otherbreak, label %exception, label %test continue:

This report is maybe two bugs but when you fix one of the two issues, the test case will become unusable to test the other issue because of constant folding.

llvmbot commented 12 years ago

The problem is that %a = malloc + store in %start, then there's the load inside the loop body. GVN isn't fooled at all, but SCEV doesn't know what that load will load.

Yes, I had to add the malloc instruction to reproduce the above test case.

Another possible fix would be adding a heap->stack transform then letting mem2reg mop it up.

That would fix the problem from comment 3, but the original test case is way more complex.

opt -O3 -S gives me:

; ModuleID = 'x.ll' target datalayout = "e-p:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-f80:32:32-v64:64:64-v128:128:128-a0:0:64"

@​hasspace = internal constant [15 x i8] c"it has a space\00" @​nospace = internal constant [16 x i8] c"it has no space\00"

declare void @​puts(i8* nocapture) nounwind

declare noalias i8* @​malloc(i32) nounwind

define void @​main() nounwind { start: %a = tail call i8 @​malloc(i32 4) store i8 32, i8 %a, align 1 br label %test

test: ; preds = %body, %start %loopvar = phi i32 [ %loopvar2, %body ], [ 0, %start ] %finished = icmp sgt i32 %loopvar, 2 br i1 %finished, label %continue, label %body

body: ; preds = %test %loopvar2 = add i32 %loopvar, 1 %otherbreak = icmp eq i32 %loopvar, 1 br i1 %otherbreak, label %exception, label %test

continue: ; preds = %test tail call void @​puts(i8 getelementptr inbounds ([16 x i8] @​nospace, i64 0, i64 0)) ret void

exception: ; preds = %body tail call void @​puts(i8 getelementptr inbounds ([15 x i8] @​hasspace, i64 0, i64 0)) ret void }

running opt a second time workarounds the issue. The problem is that the first case already had two opt runs, so after running opt 4 times the correct result was produced. Can't opt take a parameter to run itself as often as needed?

nlewycky commented 12 years ago

Ok, using comment 3 as the testcase. This is an ordering issue. "opt -gvn -loop-unroll -simplifycfg" produces:

define void @​main() { start: %a = call i8 @​malloc(i32 4) store i8 32, i8 %a call void @​puts(i8 getelementptr inbounds ([15 x i8] @​hasspace, i32 0, i32 0)) ret void }

The problem is that %a = malloc + store in %start, then there's the load inside the loop body. GVN isn't fooled at all, but SCEV doesn't know what that load will load.

Another possible fix would be adding a heap->stack transform then letting mem2reg mop it up.

llvmbot commented 12 years ago

I could reduce it to a very small testcase: (you can replace the target datalayout by your own)

target datalayout = "e-p:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-f80:32:32-v64:64:64-v128:128:128-a0:0:64"

@​char = internal constant [2 x i8] c"g " ; change this string to test the other case @​hasspace = internal constant [15 x i8] c"it has a space\00" @​nospace = internal constant [16 x i8] c"it has no space\00" declare void @​puts(i8) declare i8 @​malloc(i32)

define void @​main() { start: %a = call i8(i32) @​malloc(i32 4) store i8 32, i8 %a br label %test test: %loopvar = phi i32 [%loopvar2, %body], [0, %start] %loopvar2 = add i32 %loopvar, 1 %finished = icmp sgt i32 %loopvar, 2 ; or any other constant or non-constant br i1 %finished, label %continue, label %body body: ; sth fancy %valp = getelementptr [2 x i8] @​char, i32 0, i32 %loopvar %val = load i8 %valp %val2 = load i8 %a %otherbreak = icmp eq i8 %val, %val2 br i1 %otherbreak, label %exception, label %test continue: call void(i8) @​puts(i8 getelementptr([16 x i8] @​nospace, i32 0, i32 0)) ret void

exception: call void(i8) @​puts(i8 getelementptr([15 x i8] @​hasspace, i32 0, i32 0)) ret void }

llvmbot commented 12 years ago

There is something very strange in your testcase:

declare i32 @​relocate_kernel(i32, i32, i32, i32, i32) @​__LOAD_PHYSICAL_ADDR = external global [0 x i8] [...] call void @​llvm.memcpy.p0i8.p0i8.i32(i8 %call8, i8 inttoptr (i32 add (i32 add (i32 ptrtoint (i32 (i32, i32, i32, i32, i32) @​relocate_kernel to i32), i32 ptrtoint ([0 x i8] @​__LOAD_PHYSICAL_ADDR to i32)), i32 -1073741824) to i8*), i32 2048, i32 1, i1 false)

What in the world that constant expression argument to memcpy!? The address of the function relocate_kernel plus the address of external global __LOAD_PHYSICAL_ADDR plus -2^30 (aka. 0xC0000000)?

I just find tail call void @​llvm.memcpy.p0i8.p0i8.i32(i8 %6, i8 getelementptr inbounds ([3 x i8]* @​label416, i64 0, i64 0), i32 3, i32 1, i1 false) nounwind

Where do you see the words relocate, kernel etc.? Are you sure you opened the right file?

llvmbot commented 12 years ago

There is something very strange in your testcase:

declare i32 @​relocate_kernel(i32, i32, i32, i32, i32) @​__LOAD_PHYSICAL_ADDR = external global [0 x i8] [...] call void @​llvm.memcpy.p0i8.p0i8.i32(i8 %call8, i8 inttoptr (i32 add (i32 add (i32 ptrtoint (i32 (i32, i32, i32, i32, i32) @​relocate_kernel to i32), i32 ptrtoint ([0 x i8] @​__LOAD_PHYSICAL_ADDR to i32)), i32 -1073741824) to i8*), i32 2048, i32 1, i1 false)

What in the world that constant expression argument to memcpy!? The address of the function relocate_kernel plus the address of external global __LOAD_PHYSICAL_ADDR plus -2^30 (aka. 0xC0000000)?