llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.57k stars 11.81k forks source link

Invalid removal of "dead" stores. #3270

Closed lhames closed 16 years ago

lhames commented 16 years ago
Bugzilla Link 2898
Resolution FIXED
Resolved on Oct 17, 2008 15:57
Version trunk
OS Linux

Extended Description

The ldecod benchmark (MultiSource/Applications/JM/ldecod) is miscompiled for AMD64. I believe this is due to a bug in LocalSpiller where certain stores are assumed to be "dead", and thus are removed, when they are in fact still necessary.

The problem is exposed when the PBQP allocator is used on AMD64. It can be demonstrated by adding -regalloc=pbqp to LLCBETA_OPTIONS, and running, from projects/test-suite/MultiSource/Applications/JM/ldecod/Output,

make ENABLE_LLC=1 ENABLE_LLCBETA=1

The resulting program segfaults when run on the test input:

bash-3.2$ ./Output/ldecod.llc-beta -i data/test.264 -o Output/test_dec.yuv -r data/test_rec.yuv

----------------------------- JM 12.1 (FRExt) ----------------------------- Decoder config file : (null)

Input H.264 bitstream : data/test.264 Output decoded YUV : Output/test_dec.yuv Output status file : log.dec Input reference file : data/test_rec.yuv

POC must = frame# or field# for SNRs to be correct

Frame POC Pic# QP SnrY SnrU SnrV Y:U:V Time(ms)

Segmentation fault

The test case can be reduced with the following:

bugpoint -run-llc ldecod.llvm.bc --args -i ../data/test.264 -r ../data/test_rec.yuv --tool-args -regalloc=pbqp

Faulting basic block is itrans8x8_bb15_bb15_2E_ce.

After register allocation, before spilling bb15.ce contains the following two instructions (among many others):

    %reg1170<def> = LEA64r %reg1024, 1, %reg1103<kill>, 1384
    MOV32mr %reg1171<kill>, 4, %reg1089, 0, %reg1167<kill>, Mem:ST(4,4) [0x15c0628 + 0]

reg167, reg1170 and reg1171 are spill intervals. Pertinent allocations are: reg1170 = R8, reg1171 = R9, reg1089 = R10, reg 1167 = R8D

During spilling by LocalSpiller the following problem occurs:

    %reg1170<def> = LEA64r %R12, 1, %R8<kill>, 1384

Store: MOV64mr <fi#10>, 1, %reg0, 0, %R8 Remembering SS#10 in physreg R8 MOV64mr <fi#10>, 1, %reg0, 0, %R8 Reusing SS#10 from physreg R8 for vreg1171 instead of reloading into physreg R9 Removed dead store: MOV64mr <fi#10>, 1, %reg0, 0, %R8 PhysReg R9 clobbered, invalidating SS#9 PhysReg R8 clobbered, invalidating SS#10 Remembering SS#10 in physreg R9 %R9 = MOV64rm <fi#10>, 1, %reg0, 0 Reuse undone! Remembering SS#8 in physreg R8D %R8D = MOV32rm <fi#8>, 1, %reg0, 0 MOV32mr %R9, 4, %R10, 0, %R8D, Mem:ST(4,4) [0x15c0628 + 0]

In attempting to Reuse SS#10 in physreg R8 for the second instruction the store of reg1170/R8 is treated as dead by the block starting it VirtRegMap.cpp:1441. When it is discovered that R8 is clobbered by a later parameter the load into R9 is re-instated, but the store is not, leading to junk in R9. In general I do not think dead stores cannot be eliminated until the validity of the re-uses are confirmed.

llvmbot commented 16 years ago

Thanks! I've applied your patch. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081013/068635.html

lhames commented 16 years ago

Fix for bug 2898 - defers dead store removal caused by re-uses until validity of re-use is confirmed. This patch has been tested on SingleSource and MultiSource and does not seem to introduce any new errors on AMD64. As yet I haven't worked out whether deferring the removal of dead store causes could cause other optimisation opportunities to be missed, but I see no reason (yet) that it should.

lhames commented 16 years ago

Failure bitcode from bugpoint. You'll need to compile this using

llc -regalloc=pbqp -o bugpoint.test.bc.s bugpoint.test.bc

to reproduce the bug.

lhames commented 16 years ago

Safe bitcode produced by bugpoint...

llvmbot commented 16 years ago

Please attach a bc file. Thanks.