Open aykevl opened 5 years ago
I think the optimization is correct. You have two distinct variables of type i8*; that means they should have different addresses.
If you use "@_sbss = external global [0 x i8]" or something like that, it should do what you want.
The logic is in areGlobalsPotentiallyEqual in llvm/lib/IR/ConstantFold.cpp.
Extended Description
I have the following IR:
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "arm-none-eabi"
@_sbss = external global i8 @_ebss = external global i8
define fastcc void @runtime.preinit() { entry: br label %for.loop
for.loop: ; preds = %for.body, %entry %0 = phi i32 [ ptrtoint (i8 @_sbss to i32), %entry ], [ %3, %for.body ] %1 = icmp eq i32 %0, ptrtoint (i8 @_ebss to i32) br i1 %1, label %for.done, label %for.body
for.body: ; preds = %for.loop %2 = inttoptr i32 %0 to i32 store i32 0, i32 %2, align 4 %3 = add i32 %0, 4 br label %for.loop
for.done: ; preds = %for.loop ret void }
When I run the following command:
the runtime.preinit function is optimized to the following:
define fastcc void @runtime.preinit() { entry: br label %for.body
for.body: ; preds = %entry, %for.body %0 = phi i32 [ ptrtoint (i8 @_sbss to i32), %entry ], [ %2, %for.body ] %1 = inttoptr i32 %0 to i32 store i32 0, i32 %1, align 4 %2 = add i32 %0, 4 %3 = icmp eq i32 %2, ptrtoint (i8 @_ebss to i32) br i1 %3, label %for.done, label %for.body
for.done: ; preds = %for.body ret void }
In other words, it looks like LoopRotate thinks the comparison in the loop header can be done after the first store. I believe this optimization is incorrect, as there is no indication in the IR that it is okay to store to @_sbss unconditionally. In fact, because @_sbss and @_ebss happen to be aliases this causes a loop zeroing the entire address space until it crashes.
Thinking a bit more about this, it seems like LoopRotate introduces an off-by-one error in the loop: before this optimization the loop would zero all memory before @_ebss while after the optimization it zeroes all memory including the word behind @_ebss (which lies outside of the range it is supposed to zero).
If this is a correct optimization, please let me know why so I can fix this in the compiler frontend or the runtime library.
Note that this bug is present in both LLVM 9 and LLVM 8 (from apt.llvm.org), so it affects the current stable version as well.