lencod miscompiled on Cortex-A15 (byval issue)

d0k commented 11 years ago


Bugzilla Link	16393
Resolution	FIXED
Resolved on	Jul 17, 2013 06:06
Version	trunk
OS	All
Attachments	the miscompiled function
CC	@asl,@rengolin,@stoklund,@kaomoneus

Extended Description

MultiSource/Applications/JM/lencod has been failing on the arm-lnt buildbot since r184105. After some investigation it looks like this is caused by broken byval handling, r184105 only changed spilling decisions and uncovered the issue.

Attached is IR for the miscompiled function (compile with llc -mcpu=cortex-a15). It passes a very large object byval. Of particular interest is the part:

mov r0, #&#8203;1
mov r1, #&#8203;2
bl  RestoreMVBlock8x8
add sp, sp, #&#8203;24
add sp, sp, #&#8203;8192
sub sp, sp, #&#8203;24
sub sp, sp, #&#8203;8192
ldm r10, {r2, r3}
mov r1, sp
str r6, [sp, r8]
ldr r0, .LCPI0_0

.LBB0_57: @ %if.then248 @ =>This Inner Loop Header: Depth=1 ldr r7, [r5], #4 subs r0, r0, #4 str r7, [r1], #4 bne .LBB0_57 @ BB#58: @ %if.then248 mov r0, #1 mov r1, #3 bl RestoreMVBlock8x8 add sp, sp, #24 add sp, sp, #8192 sub lr, sp, #4096 sub r0, sp, #4096 ldr r3, [lr, #-4084] @ 4-byte Reload movw r5, :lower16:img ldr r12, [r0, #-4068] @ 4-byte Reload

The upper part just looks inefficient, after the second part we end up with a corrupted stack.

rengolin commented 11 years ago

Thanks everyone!

I'll close the bug, and if we see this again, I'll re-open.

llvmbot commented 11 years ago

Fixed in r186350 and r186364.

kaomoneus commented 11 years ago

May be just improve CallSeqStart and friends? Just add on more operand: for which CALL it was inserted? Then we can control it in more explicit way then even now.

kaomoneus commented 11 years ago

Why ADJCALLSTACKUP/DOWN are restricted to be in same BB? Are there some reasonable points? Just because it guarantees linear execution? If so we can just improve this check. For each CALL we can count all "lineary feasible" prev BBs with STACKDOWNs and "lineary feasible" STACKUPs.

ENTRY> STACKDOWN - CALL - STACKUP - STACKDOWN - some-loop - CALL STACKUP

Yup, thats a pretty complex one. But in comparison with x86, ARM doesn't have string instruction, does it? So here we can suppose there are possible more back-ends like ARM.

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 11 years ago

Can you be certain that no pass has inserted instructions between the beginning of nextblock: and SET_SP_ADJ? If there is spill code in that position, it will still be wrong.

This sounds like an issue to me. Is there any way to make sure SET_SP_ADJ stays at the beginning of a basic block? Does PROLOG_LABEL always stay somewhere?

I suppose we use MBB::SkipPHIsAndLabels() to find an insertion point in most cases.

llvmbot commented 11 years ago

I have a working patch, but it touches TII, which I do not like.

You could get around that by making SET_SPA_DJ a target-independent opcode, like COPY.

Thanks for the suggestion. Jim also suggested the same thing and I am working on it.

Another possibility is to expand pseudo STRUCT_BYVAL at a later stage at or after PEI.

I am not sure the SET_SP_ADJ approach is safe. You generate code like this:

COPY_STRUCT_BYVAL SET_SP_ADJ

Which then gets expanded to:

memcpyloop: ... nextblock: SET_SP_ADJ

Can you be certain that no pass has inserted instructions between the beginning of nextblock: and SET_SP_ADJ? If there is spill code in that position, it will still be wrong.

This sounds like an issue to me. Is there any way to make sure SET_SP_ADJ stays at the beginning of a basic block? Does PROLOG_LABEL always stay somewhere?

Thanks, Manman

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 11 years ago

I have a working patch, but it touches TII, which I do not like.

You could get around that by making SET_SPA_DJ a target-independent opcode, like COPY.

Another possibility is to expand pseudo STRUCT_BYVAL at a later stage at or after PEI.

I am not sure the SET_SP_ADJ approach is safe. You generate code like this:

COPY_STRUCT_BYVAL SET_SP_ADJ

Which then gets expanded to:

memcpyloop: ... nextblock: SET_SP_ADJ

Can you be certain that no pass has inserted instructions between the beginning of nextblock: and SET_SP_ADJ? If there is spill code in that position, it will still be wrong.

rengolin commented 11 years ago

Thanks Manman, Stepan.

It seems the bots are broken again, and the error is similar, so I'm guessing we're hitting the same problem again.

Stepan, I'm assigning the bug to you, for correctness.

llvmbot commented 11 years ago

Stepan is going to look at the possibility of expanding STRUCT_BYVAL at a later stage. Thanks,

llvmbot commented 11 years ago

a not-so-elegant working patch I have a working patch, but it touches TII, which I do not like. Another possibility is to expand pseudo STRUCT_BYVAL at a later stage at or after PEI.

kaomoneus commented 11 years ago

Typo: "Why in case" --> "While in case"

kaomoneus commented 11 years ago

CrashCase Here we are. That's a crash case. Just reproduced what Manman mentioned from scratch. For most of cases byval size is less than 2048 (2^12), so we can deal with only "sub/add sp, sp, #imm" instruction pairs. Why in case of 'lencode' data size exceeded this limit. And everything went wrong. Just call two routines with different byval values that exceeds 512+4 32bit words.

rengolin commented 11 years ago

Hi Manman,

Thanks for having a deeper look at this. If you're not already working on this, I can take a look once I finish the divmod change.

I've assigned the bug to me as it's easier for me to see it, feel free to assign it to you if you already have a solution.

thanks!

llvmbot commented 11 years ago

After discussions with others, the best solution seems to be adding a pseudo instruction "SETSPADJ", which can be inserted when expanding COPY_STRUCT_BYVAL.

The purpose of SETSPADJ is to notify PEI::replaceFrameIndices the correct SPAdj, and it can be removed right there after serving the purpose.

llvmbot commented 11 years ago

The generated assembly: @ BB#12: @ %if.then248 mov r0, #1 mov r1, #3 bl RestoreMVBlock8x8 add sp, sp, #24 add sp, sp, #8192 sub lr, sp, #8192 ldr r7, [lr, #-12] @ 4-byte Reload

The address for 4-byte reload is incorrect.

llvmbot commented 11 years ago

reduced testing case

llvmbot commented 11 years ago

Before PEI: BB#52: derived from LLVM BB %if.then248 Live Ins: %R9 %R11 %R2 %R3 %R8 Predecessors according to CFG: BB#51 %R0 = MOVi 1, pred:14, pred:%noreg, opt:%noreg %R1 = MOVi 3, pred:14, pred:%noreg, opt:%noreg BL ga:@RestoreMVBlock8x8, , %LR<imp-def,dead>, %SP, %R0, %R1, %R2, %R3, %SP ADJCALLSTACKUP 8212, 0, pred:14, pred:%noreg, %SP, %SP %R5 = MOVi32imm ga:@img %R3 = LDRi12 <fi#5>, 0, pred:14, pred:%noreg; mem:LD4[FixedStack5] %LR = MOVi32imm ga:@enc_picture %R7 = COPY %LR Successors according to CFG: BB#53

After PEI: LDRi12 <fi#5> is converted to
%vreg0 = SUBri %SP, 4096, pred:14, pred:%noreg, opt:%noreg; GPR:%vreg0 %R3 = LDRi12 %vreg0, -4080, pred:14, pred:%noreg; mem:LD4[FixedStack5] GPR:%vreg0

I guess the assumption made in PEI::replaceFrameIndices (ADJCALLSTACKUP and ADJCALLSTACKDOWN for a single call site are in the same basic block) is broken by how we handle byval.

llvmbot commented 11 years ago

Problem in this function: PEI::replaceFrameIndices

It assumes there is no need to adjust the SP at the beginning of a basic block, i.e. SPAdj is set to 0 before iterating over instructions in the basic block.

This is not true for byval call site, since a single byval call site can be converted to a few basic blocks.

In our testing case, RestoreMVBlock8x8 is converted to the following 3 basic blocks, and when we are checking the 3rd basic block (BB#58), SPAdj should not be set to 0.

    sub sp, sp, #&#8203;24
sub sp, sp, #&#8203;8192
ldm r10, {r2, r3}
mov r1, sp
str r6, [sp, r8]
ldr r0, .LCPI0_0

.LBB0_57: @ %if.then248 @ =>This Inner Loop Header: Depth=1 ldr r7, [r5], #4 subs r0, r0, #4 str r7, [r1], #4 bne .LBB0_57 @ BB#58: @ %if.then248 mov r0, #1 mov r1, #3 bl RestoreMVBlock8x8 add sp, sp, #24 add sp, sp, #8192

rengolin commented 11 years ago

I can't make bugpoint help here, it's not an assert or anything and the final BC file is identical to the source.

Stepan, you don't need the -mcpu=cortex-a15, with that BC file any ARM will break, so it should break on your Beagle. However, that is not complete module (no main or anything), so what you should do is look for the big negative offsets as a general rule, and that can be done on your x86 machine.

rengolin commented 11 years ago

Benjamin's r184835 actually "fixed" the ARM LNT bot... ;)

But we should still investigate the error. I'll run bugpoint on the chromebook and will try to pinpoint the problem, but FYI, we can't rely on trunk for regression testing.

llvmbot commented 11 years ago

Stepan,

I misread your comment. Yes, it compiled without crash, and it should crash when trying to dereference the pointer just (re)loaded into r12.

kaomoneus commented 11 years ago

Output for r.ll

Output for "llc -mcpu=cortex-a15 r.ll". llvm configure options: "--enable-optimized --enable-targets=x86_64,arm"

I didn't try it on my beagleboard yet. Suppose the executable should be crashed on the native platform? right?

llvmbot commented 11 years ago

Stepan,

Can you attach your generated assembly?

kaomoneus commented 11 years ago

Under x86_64 "llc -mcpu=cortex-a15" compiles it without any crashes. I need some time to reproduce it on beagleboard. It would be good if somebody will do the bugpoint for it.

llvmbot commented 11 years ago

I'm afraid I have little context here to understand the problem correctly, let alone offering a solution.

As far as I understand, a latent problem in ByVal was exposed by the new spilling algorithm, but I don't understand why the spilling offset is so off. All I can guess is that the information about where the stack pointer lies is different on both locations.

It may not be a problem related to byval, from the assembly, the byval handling looks correct, but the spill offset is wrong. I can look deeper into this if nobody is working on it.

Manman, Stepan,

This looks like the other stack bug we were investigating on the list:

http://article.gmane.org/gmane.comp.compilers.llvm.devel/63064

This is already fixed on trunk.

Benjamin,

Do you know where in the new spilling algorithm the information is wrong? Who provides you with that information?

rengolin commented 11 years ago

I'm afraid I have little context here to understand the problem correctly, let alone offering a solution.

As far as I understand, a latent problem in ByVal was exposed by the new spilling algorithm, but I don't understand why the spilling offset is so off. All I can guess is that the information about where the stack pointer lies is different on both locations.

Manman, Stepan,

This looks like the other stack bug we were investigating on the list:

http://article.gmane.org/gmane.comp.compilers.llvm.devel/63064

Benjamin,

Do you know where in the new spilling algorithm the information is wrong? Who provides you with that information?

llvmbot commented 11 years ago

The spill before the 4 call sites: str r12, [sp, #48] @ 4-byte Spill

llvmbot commented 11 years ago

The spilling address for r12 seems to be wrong, it is at sp-4096-4068 = sp - 8164 sub r0, sp, #4096 ldr r12, [r0, #-4068] @ 4-byte Reload that address should be reserved for the call site: from sp-24-8192 = sp - 8216 sub sp, sp, #24 sub sp, sp, #8192 bl RestoreMVBlock8x8 add sp, sp, #24 add sp, sp, #8192

d0k commented 11 years ago

Did it crash afterwards?

It crashed when trying to dereference the pointer just (re)loaded into r12.

llvmbot commented 11 years ago

The related IR: tail call void @RestoreMVBlock8x8(i32 1, i32 0, %struct.RD_8x8DATA byval @tr8x8, i32 %conv.i) #0 tail call void @RestoreMVBlock8x8(i32 1, i32 1, %struct.RD_8x8DATA byval @tr8x8, i32 %conv.i) #0 tail call void @RestoreMVBlock8x8(i32 1, i32 2, %struct.RD_8x8DATA byval @tr8x8, i32 %conv.i) #0 tail call void @RestoreMVBlock8x8(i32 1, i32 3, %struct.RD_8x8DATA byval @tr8x8, i32 %conv.i) #0 br label %if.end249

The code generated for the call looks okay to me: sub sp, sp, #24 sub sp, sp, #8192 ldm r10, {r2, r3} mov r1, sp str r6, [sp, r8] ldr r0, .LCPI0_0 .LBB0_57: @ %if.then248 @ =>This Inner Loop Header: Depth=1 ldr r7, [r5], #4 subs r0, r0, #4 str r7, [r1], #4 bne .LBB0_57 @ BB#58: @ %if.then248 mov r0, #1 mov r1, #3 bl RestoreMVBlock8x8 add sp, sp, #24 add sp, sp, #8192

Did it crash afterwards? sub lr, sp, #4096 sub r0, sp, #4096 ldr r3, [lr, #-4084] @ 4-byte Reload movw r5, :lower16:img ldr r12, [r0, #-4068] @ 4-byte Reload movt r5, :upper16:img .LBB0_59: @ %if.end249

d0k commented 11 years ago

assigned to @kaomoneus

llvm / llvm-project

lencod miscompiled on Cortex-A15 (byval issue) #16767

Extended Description