llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.86k stars 11.91k forks source link

spec2000/188.ammp, spec2006/433.milc, 444.namd, 447.dealII, 453.povray compilation fails on LTO stage after commit r256394 #26373

Closed llvmbot closed 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 25999
Resolution FIXED
Resolved on Jan 11, 2016 15:53
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @ahmedbougacha,@delena,@RKSimon,@rotateright

Extended Description

Bisect analysis showed LLVM revision 256394 is responsible for the fails. The comments to commit are the following.

commit 75759ab3e9255fe5f716e4a71ca1ee56901dedf8 Author: Sanjay Patel spatel@rotateright.com Date: Thu Dec 24 21:17:56 2015 +0000

[InstCombine] transform more extract/insert pairs into shuffles (#2109 )

This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve llvm/llvm-project#2481 :
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq        (%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa      %xmm1, %xmm2
shufps      $229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps      $48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps      $132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps      $32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps      $36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd      (%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256394 91177308-0d34-0410-b5e6-96231b3b80d8

LLVM-clang options: -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static -mfpmath=sse -march=core-avx2

During LTO phase spec benchmarks fail with the following error message (e.g., spec2006/444.namd).

runspec --config=lnx-x86_64-clang-default.cfg --rebuild -a build -e ref64 -T base 444 …………………………………………

clang++ -m64 -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static -mfpmath=sse -march=core-avx2 -DSPEC_CPU_LP64 Compute.o ComputeList.o ComputeNonbondedUtil.o LJTable.o Molecule.o Patch.o PatchList.o ResultSet.o SimParameters.o erf.o spec_namd.o -o namd Instruction does not dominate all uses! %782 = extractelement <2 x double> %721, i32 1 %779 = insertelement <4 x double> undef, double %782, i32 0 Instruction does not dominate all uses! %1053 = extractelement <2 x double> %974, i32 1 %1050 = insertelement <4 x double> undef, double %1053, i32 0 Instruction does not dominate all uses! %1332 = shufflevector <2 x double> %1263, <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %1330 = shufflevector <4 x double> %1329, <4 x double> %1332, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> LLVM ERROR: Broken function found, compilation aborted! clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation) specmake: *** [namd] Error 1

Okunev Sergey, Software Engineer Intel Compiler Team

rotateright commented 8 years ago

Thanks all for the test cases; sorry for the breakage. Resolving as fixed.

llvmbot commented 8 years ago

Another fix attempt: http://reviews.llvm.org/rL257133

After commit r257133 remaining benchmarks spec2006/444.namd, 447.dealII ran successfully. The issue is resolved.

llvmbot commented 8 years ago

Hey Sanjay, I've been hitting this as well and I think I was able to write a (pretty contrived) testcase:

define <4 x double> @​t(i1 %c, i1 %c2, <2 x double> %a, <4 x double> %b) { bb1: br i1 %c, label %bb2, label %bb3

bb2: %r = call <2 x double> @​dummy(<2 x double> %a) br label %bb3

bb3: %tmp1 = phi <2 x double> [ %a, %bb1 ], [ %r, %bb2 ] %tmp2 = phi <4 x double> [ %b, %bb1 ], [ zeroinitializer, %bb2 ] %tmp3 = extractelement <2 x double> %tmp1, i32 0 %tmp4 = insertelement <4 x double> %tmp2, double %tmp3, i32 2 ret <4 x double> %tmp4 }

declare <2 x double> @​dummy(<2 x double>)

The shuffle gets inserted between the PHIs; breakage ensues.

Thank you, Ahmed, for making suitable reproducer.

rotateright commented 8 years ago

Another fix attempt: http://reviews.llvm.org/rL257133

rotateright commented 8 years ago

Patch posted for review: http://reviews.llvm.org/D15981

rotateright commented 8 years ago

The shuffle gets inserted between the PHIs; breakage ensues.

Aha PHIs...thanks!

So I think we need to use "getFirstInsertionPt()" or some variant even when the extractelement's vector operand is defined by an instruction. It should be a tiny patch. I'll post it for review ASAP.

ahmedbougacha commented 8 years ago

Hey Sanjay, I've been hitting this as well and I think I was able to write a (pretty contrived) testcase:

define <4 x double> @​t(i1 %c, i1 %c2, <2 x double> %a, <4 x double> %b) { bb1: br i1 %c, label %bb2, label %bb3

bb2: %r = call <2 x double> @​dummy(<2 x double> %a) br label %bb3

bb3: %tmp1 = phi <2 x double> [ %a, %bb1 ], [ %r, %bb2 ] %tmp2 = phi <4 x double> [ %b, %bb1 ], [ zeroinitializer, %bb2 ] %tmp3 = extractelement <2 x double> %tmp1, i32 0 %tmp4 = insertelement <4 x double> %tmp2, double %tmp3, i32 2 ret <4 x double> %tmp4 }

declare <2 x double> @​dummy(<2 x double>)

The shuffle gets inserted between the PHIs; breakage ensues.

rotateright commented 8 years ago

IR Dump Before Combine redundant instructions

...which is incomplete due to the crash.

Disregard that comment; the crash text appears to be after the IR is complete.

I think I still need the struct definitions though, unless there's some quick way to reverse engineer those from the IR. :)

rotateright commented 8 years ago

There is fragment of spec2006/444.namd IR dump for ComputeNonbondedUtil.C module and one function ‘_ZN20ComputeNonbondedUtil20calc_pair_energy_fepEP9nonbonded'.

I don't know how to debug this without having the struct definitions that are used in the IR. Can you attach that too?

Also, it looks like the most relevant IR that we need to debug has been cut from the output.

There's:

IR Dump Before Simplify the CFG

which has no insertelement / extractelement instructions, so I don't think it can trigger the bug. And then it skips to:

IR Dump Before Combine redundant instructions

...which is incomplete due to the crash.

rotateright commented 8 years ago

There is fragment of spec2006/444.namd IR dump for ComputeNonbondedUtil.C module and one function ‘_ZN20ComputeNonbondedUtil20calc_pair_energy_fepEP9nonbonded'.

I don't know how to debug this without having the struct definitions that are used in the IR. Can you attach that too?

llvmbot commented 8 years ago

I went ahead and checked in a fix that I hope will solve the SPEC errors too: http://reviews.llvm.org/rL256857

After commit r256857 three benchmarks (spec2000/188.ammp, spec2006/433.milc, 453.povray) ran successfully. And two benchmarks (spec2006/444.namd, 447.dealII) fail while one module compiling with the following stack, e.g, for 444.namd.

444.namd: ………….. clang++ -m64 -c -o ComputeNonbondedUtil.o -DSPEC_CPU -DNDEBUG -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static -mfpmath=sse -march=core-avx2 -DSPEC_CPU_LP64 ComputeNonbondedUtil.C clang-3.8: warning: argument unused during compilation: '-fuse-ld=gold'

​0 0x00000000018e0e35 llvm::sys::PrintStackTrace(llvm::raw_ostream&)

​1 0x00000000018df076 llvm::sys::RunSignalHandlers()

​2 0x00000000018df265 SignalHandler(int)

​3 0x0000003a954100d0 __restore_rt (/lib64/libpthread.so.0+0x3a954100d0)

​4 0x00000000016788b3 llvm::InstCombiner::visitPHINode(llvm::PHINode&)

​5 0x00000000016180de llvm::InstCombiner::run()

​6 0x000000000161918b combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::LoopInfo)

​7 0x00000000016196ae (anonymous namespace)::InstructionCombiningPass::runOnFunction(llvm::Function&)

​8 0x000000000158150a llvm::FPPassManager::runOnFunction(llvm::Function&)

​9 0x0000000001581ad3 llvm::legacy::PassManagerImpl::run(llvm::Module&)

​10 0x00000000019f9f79 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module, clang::BackendAction, llvm::raw_pwrite_stream, std::unique_ptr<llvm::FunctionInfoIndex, std::default_delete >)

​11 0x0000000001f277d5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)

​12 0x00000000021e8afd clang::ParseAST(clang::Sema&, bool, bool)

​13 0x0000000001cb1526 clang::FrontendAction::Execute()

​14 0x0000000001c8d2a6 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)

​15 0x0000000001d348f3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/export/users/skokunev/analysis/llvm/r_tr/pd/010516/runs/256857/install/bin/clang-3.8+0x1d348f3)

​16 0x00000000008a7a60 cc1_main(llvm::ArrayRef<char const>, char const, void*)

​17 0x000000000085da74 main

​18 0x0000003a94c1ffe0 __libc_start_main (/lib64/libc.so.6+0x3a94c1ffe0)

​19 0x00000000008a56a4 _start

Compilation fails on function ‘_ZN20ComputeNonbondedUtil20calc_pair_energy_fepEP9nonbonded’. IR dump fragments of this function before three compilation phases obtained with ‘-mllvm -print-before-all’ option is in attachment.

llvmbot commented 8 years ago

spec2006/444.namd IR dump for ComputeNonbondedUtil.C module

There is fragment of spec2006/444.namd IR dump for ComputeNonbondedUtil.C module and one function ‘_ZN20ComputeNonbondedUtil20calc_pair_energy_fepEP9nonbonded'.

rotateright commented 8 years ago

I went ahead and checked in a fix that I hope will solve the SPEC errors too: http://reviews.llvm.org/rL256857

rotateright commented 8 years ago

IR Dump After Module Verifier ; Function Attrs: nounwind uwtable
define internal i32 @​u_f_nonbon(double %lambda) #​0 { ……………………………………… } Instruction does not dominate all uses!

From the attachment, I'm not seeing what this function looked like before it was broken, so I can't make a test case from it. The attachment is also missing declares/defines for structs and attributes.

I have an idea what the problem is though, so I'll update the patch for bug 26015. Please let me know if that solves this bug too. Thanks!

llvmbot commented 8 years ago

Hi Sergey - I don't have access to SPEC. Can you attach an IR test case?

Hi, Sanjay, After commiting of fix for llvm/llvm-bugzilla-archive#26015 , I am verifying described benchmarks.

For spec2000/188.ammp on LTO phase compilation fails “After Module Verifier” of function ‘u_f_nonbon’ as follows. IR dump obtained with ‘-Wl,-plugin-opt=-print-after-all’ option is in attachment.

IR Dump After Module Verifier ; Function Attrs: nounwind uwtable
define internal i32 @​u_f_nonbon(double %lambda) #​0 { ……………………………………… } Instruction does not dominate all uses!
%374 = extractelement <2 x double> %356, i32 0 %369 = fmul fast double %368, %374 LLVM ERROR: Broken function found, compilation aborted! clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)

llvmbot commented 8 years ago

spec2000/188.ammp IR dump on LTO phase spec2000/188.ammp IR dump is obtained with ‘-Wl,-plugin-opt=-print-after-all’ option on LTO phase.

rotateright commented 8 years ago

Hi Sergey - I don't have access to SPEC. Can you attach an IR test case?