llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.62k stars 11.83k forks source link

x86_64: Cannot emit physreg copy instruction #25454

Closed DimitryAndric closed 9 years ago

DimitryAndric commented 9 years ago
Bugzilla Link 25080
Resolution FIXED
Resolved on Oct 13, 2015 04:48
Version trunk
OS All
CC @adibiagio,@d0k,@echristo,@emaste,@juikim

Extended Description

This reproduced with trunk r248374. Minimized test case:

char x0; *x1; x2(x3) { return x3 == 0 || x3 == 11 || x3 == 22; } x4() { int x5, x6, x7; for (; x1 && x7 < x0; x7++) if (x2(x1[x7])) x5++; else x6++; return x5 && x6; }

Compile with:

clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -target-cpu sandybridge -O2 -vectorize-loops testcase.c

Resulting in:

Cannot emit physreg copy instruction UNREACHABLE executed at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3963! Abort trap

The target CPU must apparently be sandybridge or higher, the error does not reproduce with 'plain' x86_64.

adibiagio commented 9 years ago

Hi Andrea, thanks for the patch! It fixes both our original test case from FreeBSD, my initial reduced test case, and your bugpoint reduced test case.

I will import this into FreeBSD pretty soon.

I am glad that this patch works for you. I am going to resolve this bug.

Cheers, Andrea

DimitryAndric commented 9 years ago

Hi Andrea, thanks for the patch! It fixes both our original test case from FreeBSD, my initial reduced test case, and your bugpoint reduced test case.

I will import this into FreeBSD pretty soon.

adibiagio commented 9 years ago

Fixed at revision 250085. http://llvm.org/viewvc/llvm-project?view=revision&revision=250085

adibiagio commented 9 years ago

Hey Dimitry.

I created http://reviews.llvm.org/D13660. Could you please check if the new patch still fixes the original problem?

Many thanks! :-) Andrea

DimitryAndric commented 9 years ago

Would it be a problem if I send a patch for review on monday (if it is not too late for you)? Here is 21:43 now, and my wife would kill me if I work on weekends :-)

There is no hurry, all the earlier detective work seems to have proven that this is a rather tricky issue, so it is better to think it through calmly and without pressure.

Besides, I would not want to be guilty of causing divorces. ;) Please enjoy your weekend!

adibiagio commented 9 years ago

I'll see if I can create a proper patch that addresses all the problematic cases. The current patch is only addressing one particular case and it requires more improvements. Would it be a problem if I send a patch for review on monday (if it is not too late for you)? Here is 21:43 now, and my wife would kill me if I work on weekends :-)

DimitryAndric commented 9 years ago

Here is a patch that addresses the problem with type legalized SETCC nodes with operands and destination of different types.

The patch is just a hack/work in progress and only addresses a very specific case.

Thanks a lot! I can confirm that this fixes both the original test case (from FreeBSD's ieee802_11_common.c), my .c reduced test case, and your .ll test case.

Maybe this should be put up into a Phabricator review?

adibiagio commented 9 years ago

prototype patch Here is a patch that addresses the problem with type legalized SETCC nodes with operands and destination of different types.

The patch is just a hack/work in progress and only addresses a very specific case.

adibiagio commented 9 years ago

I think I understand now what's going on.

The problem seems to be caused by a wrong lowering of SETCC nodes in the X86 backend. The reproducible exposed a problem which has probably been there since ages..

this is another (even smaller) reproducible:

define <8 x i16> @​test(<8 x i32> %a) { entry: %0 = trunc <8 x i32> %a to <8 x i32> %1 = icmp eq <8 x i23> %0, zeroinitializer %2 = or <8 x i1> %1, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false> %3 = zext <8 x i1> %2 to <8 x i16> ret <8 x i16> %3 }

Here, type <8 x i23> is not a legal type for the target. Legal vector types for a corei7-avx would be <8 x i16> and <8 x i32>.

instruction: %1 = icmp eq <8 x i23> %0, zeroinitializer

is lowered in SelectionDAG as: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly promote the return type of the setcc node from <8 x i1> to <8 x i16>. It would then promote the setcc operand type from <8 x i23> to <8 x i32> since type <8 x i32> is illegal for the target.

So, we end up with a v8i16 = setcc t32, t25, seteq:ch

where t32 and t25 are now values of type v8i32.

There are two problems here: 1) It looks like function LowerVSETCC works under the assumption that both operand type and result type are the same. 2) Because of point 1. function LowerVSETCC ends up expanding the 'setcc' instruction into this:

t37: v8i16 = X86ISD::PCMPEQ t36, t25 t36: v8i32 = bitcast t35 ... t25: v8i32 = build_vector of all zeroes.

The PCMPEQ here is wrong because SSE/AVX PCMPEQ expects operands and return type to be of the same type. Interestingly, we end up matching instruction VPCMPEQWrr. So we obtain something like: t37: v8i16 = X86ISD::VPCMPEQWrr v8i32 t36, v8i32 t25

However, t36 and t25 are VR256 and not VR128. This inconsistency eventually leads to the insertion of COPY instructions like these: %vreg7 = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which are not valid since source and destination are of different size, and this is not a sub register copy.

DimitryAndric commented 9 years ago

I have attached a bugpoint reduced reproducible.

Thanks!

I can use that reproducible to obtain the same llc crash ('Cannot emit physreg copy instruction'. UNREACHABLE executed at X86InstrInfo.cpp:3967!). ... So, it looks like the problem may be between those two revisions (232000, 233000]. To clarify, I don't think revision 228923 is doing anything wrong in this context.

Okay, so bisecting between those, using your bugpoint reduced test case, leads to r232879 ("Cache the Function dependent subtarget on the MachineFunction") by Eric Christopher.

This commit has the comment: "As preparation for removing the getSubtargetImpl() call from TargetMachine go ahead and flip the switch on caching the function dependent subtarget and remove the bare getSubtargetImpl call from the X86 port."

It also looks pretty innocuous, so again the question is what deeper problem is exposed by it? Eric, do you have any idea?

adibiagio commented 9 years ago

Minimal reproducible obtained with bugpoint I have attached a bugpoint reduced reproducible.

I can use that reproducible to obtain the same llc crash ('Cannot emit physreg copy instruction'. UNREACHABLE executed at X86InstrInfo.cpp:3967!).

Also, I am able to reproduce the crash even if I revert revision 228923.

To me, that makes sense because r228923 cannot have caused this problem. What probably happened is that r228923 might have uncovered a latent bug.

Revision 228923 teaches how to compute the cost of a truncate/zext based on the target information. The TTI cost model now queries TLI (method isTruncateFree and method isZExtFree) to see if a zext/trunc is free for the target.

A change in that cost model would affect the loop vectorizer (and other passes that use the cost model). However, it shouldn't affect llc. Also, r228923 doesn't modify isTruncateFree/isZExtFree; therefore it cannot be the cause of the llc crash (and the IR generated by opt is valid).

The minimal reproducible builds fine (i.e. no crash) using llc at revision r232000. As soon as I upgrade to r233000, llc starts crashing due to UNREACHABLE code executed.

So, it looks like the problem may be between those two revisions (232000, 233000]. To clarify, I don't think revision 228923 is doing anything wrong in this context.

adibiagio commented 9 years ago

I started looking at this. I will post my findings soon.

DimitryAndric commented 9 years ago

When always enabling the loop-rotation, Further bisection shows that this was introduced in r228923 ("[TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target") by Andrea Di Biagio.

Andrea, do you have any idea how changes in the cost heuristics could lead to an error like this?

DimitryAndric commented 9 years ago

I've bisected, and this error started at r231820 ("Enable loop-rotate before loop-vectorize by default") by Michael Zolotukhin. However, the error must have been in there before, and the default enabling of loop rotation just exposed it...

DimitryAndric commented 9 years ago

Indeed, it looks related to AVX. Instead of specifying sandybridge as the target CPU, this also leads to the error:

clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -target-cpu x86-64 -target-feature +avx -O2 -vectorize-loops testcase.c

DimitryAndric commented 9 years ago

The backtrace goes as follows:

Cannot emit physreg copy instruction UNREACHABLE executed at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3967! [New Thread 808215000 (LWP 100175)]

Program received signal SIGABRT, Aborted. [Switching to Thread 808215000 (LWP 100175)] 0x0000000807888c0a in thr_kill () from /lib/libc.so.7 (gdb) bt

​0 0x0000000807888c0a in thr_kill () from /lib/libc.so.7

​1 0x0000000807888bf8 in raise () from /lib/libc.so.7

​2 0x0000000807888b79 in abort () from /lib/libc.so.7

​3 0x00000000051a2580 in llvm::llvm_unreachable_internal (msg=0x5703d66 "Cannot emit physreg copy instruction", file=0x570399a "/share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp", line=3967) at /share/dim/src/llvm/trunk/lib/Support/ErrorHandling.cpp:117

​4 0x0000000003dc3795 in llvm::X86InstrInfo::copyPhysReg (this=0x8084ab328, MBB=..., MI=..., DL=..., DestReg=133, SrcReg=173, KillSrc=false) at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3967

​5 0x000000000454242c in (anonymous namespace)::ExpandPostRA::LowerCopy (this=0x808412cf0, MI=0x80828a6c0) at /share/dim/src/llvm/trunk/lib/CodeGen/ExpandPostRAPseudos.cpp:166

​6 0x00000000045419b1 in (anonymous namespace)::ExpandPostRA::runOnMachineFunction (this=0x808412cf0, MF=...) at /share/dim/src/llvm/trunk/lib/CodeGen/ExpandPostRAPseudos.cpp:215

​7 0x000000000431cdde in llvm::MachineFunctionPass::runOnFunction (this=0x808412cf0, F=...) at /share/dim/src/llvm/trunk/lib/CodeGen/MachineFunctionPass.cpp:43

​8 0x00000000050b923f in llvm::FPPassManager::runOnFunction (this=0x80822fa40, F=...) at /share/dim/src/llvm/trunk/lib/IR/LegacyPassManager.cpp:1521

​9 0x00000000050b9555 in llvm::FPPassManager::runOnModule (this=0x80822fa40, M=...) at /share/dim/src/llvm/trunk/lib/IR/LegacyPassManager.cpp:1542

​10 0x00000000050ba173 in (anonymous namespace)::MPPassManager::runOnModule (this=0x808332a00, M=...) at /share/dim/src/llvm/trunk/lib/IR/LegacyPassManager.cpp:1598

​11 0x00000000050b9816 in llvm::legacy::PassManagerImpl::run (this=0x808446300, M=...) at /share/dim/src/llvm/trunk/lib/IR/LegacyPassManager.cpp:1701

​12 0x00000000050bad81 in llvm::legacy::PassManager::run (this=0x80821e150, M=...) at /share/dim/src/llvm/trunk/lib/IR/LegacyPassManager.cpp:1732

​13 0x0000000000f075fb in (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (this=0x7fffffff9298, Action=clang::Backend_EmitObj, OS=0x808218f40) at /share/dim/src/llvm/trunk/tools/clang/lib/CodeGen/BackendUtil.cpp:645

​14 0x0000000000f066f2 in clang::EmitBackendOutput (Diags=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=0x80832ce00, Action=clang::Backend_EmitObj, OS=0x808218f40) at /share/dim/src/llvm/trunk/tools/clang/lib/CodeGen/BackendUtil.cpp:657

​15 0x0000000000e97580 in clang::BackendConsumer::HandleTranslationUnit (this=0x8082a60c0, C=...) at /share/dim/src/llvm/trunk/tools/clang/lib/CodeGen/CodeGenAction.cpp:184

​16 0x0000000001388f12 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=false) at /share/dim/src/llvm/trunk/tools/clang/lib/Parse/ParseAST.cpp:168

​17 0x000000000098579b in clang::ASTFrontendAction::ExecuteAction (this=0x8082b10e0) at /share/dim/src/llvm/trunk/tools/clang/lib/Frontend/FrontendAction.cpp:538

​18 0x0000000000e95e99 in clang::CodeGenAction::ExecuteAction (this=0x8082b10e0) at /share/dim/src/llvm/trunk/tools/clang/lib/CodeGen/CodeGenAction.cpp:789

​19 0x0000000000984b00 in clang::FrontendAction::Execute (this=0x8082b10e0) at /share/dim/src/llvm/trunk/tools/clang/lib/Frontend/FrontendAction.cpp:439

​20 0x00000000008f5aa2 in clang::CompilerInstance::ExecuteAction (this=0x80829b000, Act=...) at /share/dim/src/llvm/trunk/tools/clang/lib/Frontend/CompilerInstance.cpp:838

​21 0x000000000088cfa2 in clang::ExecuteCompilerInvocation (Clang=0x80829b000) at /share/dim/src/llvm/trunk/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:222

​22 0x00000000008648a7 in cc1_main (Argv=..., Argv0=0x7fffffffe778 "/home/dim/obj/llvm-249477-trunk-freebsd11-amd64-aconf-dbg-1/Debug+Asserts+Checks/bin/clang", MainAddr=0x87de80 <GetExecutablePath(char const*, bool)>) at /share/dim/src/llvm/trunk/tools/clang/tools/driver/cc1_main.cpp:116

​23 0x0000000000881359 in ExecuteCC1Tool (argv=..., Tool=...) at /share/dim/src/llvm/trunk/tools/clang/tools/driver/driver.cpp:301

​24 0x000000000087eb46 in main (argc=10, argv=0x7fffffffe498) at /share/dim/src/llvm/trunk/tools/clang/tools/driver/driver.cpp:366

(gdb) frame 4

​4 0x0000000003dc3795 in llvm::X86InstrInfo::copyPhysReg (this=0x8084ab328, MBB=..., MI=..., DL=..., DestReg=133, SrcReg=173, KillSrc=false)

at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3967

3967 llvm_unreachable("Cannot emit physreg copy instruction"); (gdb) print RI $1 = { = { = { = {Desc = 0x577da60 , NumRegs = 246, RAReg = 41, PCReg = 41, Classes = 0x67254e0 , NumClasses = 80, NumRegUnits = 131, RegUnitRoots = 0x577f170 , DiffLists = 0x577d490 , RegUnitMaskSequences = 0x577d5d0 , RegStrings = 0x577d640 "XMM10", RegClassStrings = 0x577f380 "RFP80", SubRegIndices = 0x577d600 , SubRegIdxRanges = 0x577d620 , NumSubRegIndices = 7, RegEncodingTable = 0x57826c0 , L2DwarfRegsSize = 146, EHL2DwarfRegsSize = 146, Dwarf2LRegsSize = 73, EHDwarf2LRegsSize = 73, L2DwarfRegs = 0x5780b00 , EHL2DwarfRegs = 0x57818e0 , Dwarf2LRegs = 0x5780220 , EHDwarf2LRegs = 0x5780690 , L2SEHRegs = {<llvm::DenseMapBase<llvm::DenseMap<unsigned int, int, llvm::DenseMapInfo, llvm::detail::DenseMapPair<unsigned int, int> >, unsigned int, int, llvm::DenseMapInfo, llvm::detail::DenseMapPair<unsigned int, int> >> = { = {Epoch = 245}, }, Buckets = 0x808430000, NumEntries = 245, NumTombstones = 0, NumBuckets = 512}}, _vptr$TargetRegisterInfo = 0x6883d80 <vtable for llvm::X86RegisterInfo+16>, InfoDesc = 0x56770d0 , SubRegIndexNames = 0x6723450 <_ZN4llvmL20SubRegIndexNameTableE>, SubRegIndexLaneMasks = 0x5677880 <_ZN4llvmL24SubRegIndexLaneMaskTableE>, RegClassBegin = 0x6883980 <_ZN4llvm12_GLOBAL__N_1L15RegisterClassesE>, RegClassEnd = 0x6883c00 , CoveringLanes = 4294967288}, }, Is64Bit = true, IsWin64 = false, SlotSize = 8, StackPtr = 44, FramePtr = 36, BasePtr = 37} (gdb) print SrcReg $2 = 173 (gdb) print DestReg $3 = 133

173 is YMM15, apparently, while 133 is XMM7. So this seems to be caused by AVX?

1af946f3-92c6-4c1f-92d5-555366fd726b commented 9 years ago

The target CPU must apparently be sandybridge or higher, the error does not reproduce with 'plain' x86_64.

With AMD processors, bdver1 and higher shows the same problem. amdfam10 and lower is fine.

DimitryAndric commented 9 years ago

assigned to @adibiagio