Open llvmbot opened 12 years ago
The problem is that %a = malloc + store in %start, then there's the load inside the loop body. GVN isn't fooled at all, but SCEV doesn't know what that load will load.
Yes, I had to add the malloc instruction to reproduce the above test case.
Another possible fix would be adding a heap->stack transform then letting mem2reg mop it up.
That would fix the problem from comment 3, but the original test case is way more complex.
opt -O3 -S gives me:
; ModuleID = 'x.ll' target datalayout = "e-p:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-f80:32:32-v64:64:64-v128:128:128-a0:0:64"
@hasspace = internal constant [15 x i8] c"it has a space\00" @nospace = internal constant [16 x i8] c"it has no space\00"
declare void @puts(i8* nocapture) nounwind
declare noalias i8* @malloc(i32) nounwind
define void @main() nounwind { start: %a = tail call i8 @malloc(i32 4) store i8 32, i8 %a, align 1 br label %test
test: ; preds = %body, %start %loopvar = phi i32 [ %loopvar2, %body ], [ 0, %start ] %finished = icmp sgt i32 %loopvar, 2 br i1 %finished, label %continue, label %body
body: ; preds = %test %loopvar2 = add i32 %loopvar, 1 %otherbreak = icmp eq i32 %loopvar, 1 br i1 %otherbreak, label %exception, label %test
continue: ; preds = %test tail call void @puts(i8 getelementptr inbounds ([16 x i8] @nospace, i64 0, i64 0)) ret void
exception: ; preds = %body tail call void @puts(i8 getelementptr inbounds ([15 x i8] @hasspace, i64 0, i64 0)) ret void }
running opt a second time workarounds the issue. The problem is that the first case already had two opt runs, so after running opt 4 times the correct result was produced. Can't opt take a parameter to run itself as often as needed?
Ok, using comment 3 as the testcase. This is an ordering issue. "opt -gvn -loop-unroll -simplifycfg" produces:
define void @main() { start: %a = call i8 @malloc(i32 4) store i8 32, i8 %a call void @puts(i8 getelementptr inbounds ([15 x i8] @hasspace, i32 0, i32 0)) ret void }
The problem is that %a = malloc + store in %start, then there's the load inside the loop body. GVN isn't fooled at all, but SCEV doesn't know what that load will load.
Another possible fix would be adding a heap->stack transform then letting mem2reg mop it up.
I could reduce it to a very small testcase: (you can replace the target datalayout by your own)
target datalayout = "e-p:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-f80:32:32-v64:64:64-v128:128:128-a0:0:64"
@char = internal constant [2 x i8] c"g " ; change this string to test the other case @hasspace = internal constant [15 x i8] c"it has a space\00" @nospace = internal constant [16 x i8] c"it has no space\00" declare void @puts(i8) declare i8 @malloc(i32)
define void @main() { start: %a = call i8(i32) @malloc(i32 4) store i8 32, i8 %a br label %test test: %loopvar = phi i32 [%loopvar2, %body], [0, %start] %loopvar2 = add i32 %loopvar, 1 %finished = icmp sgt i32 %loopvar, 2 ; or any other constant or non-constant br i1 %finished, label %continue, label %body body: ; sth fancy %valp = getelementptr [2 x i8] @char, i32 0, i32 %loopvar %val = load i8 %valp %val2 = load i8 %a %otherbreak = icmp eq i8 %val, %val2 br i1 %otherbreak, label %exception, label %test continue: call void(i8) @puts(i8 getelementptr([16 x i8] @nospace, i32 0, i32 0)) ret void
exception: call void(i8) @puts(i8 getelementptr([15 x i8] @hasspace, i32 0, i32 0)) ret void }
There is something very strange in your testcase:
declare i32 @relocate_kernel(i32, i32, i32, i32, i32) @__LOAD_PHYSICAL_ADDR = external global [0 x i8] [...] call void @llvm.memcpy.p0i8.p0i8.i32(i8 %call8, i8 inttoptr (i32 add (i32 add (i32 ptrtoint (i32 (i32, i32, i32, i32, i32) @relocate_kernel to i32), i32 ptrtoint ([0 x i8] @__LOAD_PHYSICAL_ADDR to i32)), i32 -1073741824) to i8*), i32 2048, i32 1, i1 false)
What in the world that constant expression argument to memcpy!? The address of the function relocate_kernel plus the address of external global __LOAD_PHYSICAL_ADDR plus -2^30 (aka. 0xC0000000)?
I just find tail call void @llvm.memcpy.p0i8.p0i8.i32(i8 %6, i8 getelementptr inbounds ([3 x i8]* @label416, i64 0, i64 0), i32 3, i32 1, i1 false) nounwind
Where do you see the words relocate, kernel etc.? Are you sure you opened the right file?
There is something very strange in your testcase:
declare i32 @relocate_kernel(i32, i32, i32, i32, i32) @__LOAD_PHYSICAL_ADDR = external global [0 x i8] [...] call void @llvm.memcpy.p0i8.p0i8.i32(i8 %call8, i8 inttoptr (i32 add (i32 add (i32 ptrtoint (i32 (i32, i32, i32, i32, i32) @relocate_kernel to i32), i32 ptrtoint ([0 x i8] @__LOAD_PHYSICAL_ADDR to i32)), i32 -1073741824) to i8*), i32 2048, i32 1, i1 false)
What in the world that constant expression argument to memcpy!? The address of the function relocate_kernel plus the address of external global __LOAD_PHYSICAL_ADDR plus -2^30 (aka. 0xC0000000)?
Extended Description
I attached a llvm-ir translation of the following program: static init begin if(copy('hello', 1, 2) == 'el') { log('correct') } end
This program should be completely constant folded, but the optimization stucks at a loop: the loop begins at line 51 and compares two strings of the same size. As you see in line 61+63 and 62+64, the characters of the strings at this specific position are loaded and this exactly is the problem.
It is about the same issue like http://llvm.org/bugs/show_bug.cgi?id=11142 but this time, it's a bit more complex: the memory is not directly copied, but it's received by a getelementptr+load from the malloced memory. The challenge is to proof that the loop will stay exactly in the copied range of [3 x i8]* @label416 so that the getelementptr call can fetched from @label416 instead of %result.i.i10
An other question is why the loop that only consists of two iterations is not constant folded. The loop has the following structure:
start: br label %test test: %loopvar = phi i32 [%loopvar2, %body], [0, %start] %finished = icmp sgt i32 %loopvar, 2 ; or any other constant or non-constant br i1 %finished, label %continue, label %body body: ; sth fancy %otherbreak = icmp ..... br i1 %otherbreak, label %exception, label %test continue:
This report is maybe two bugs but when you fix one of the two issues, the test case will become unusable to test the other issue because of constant folding.