Closed DimitryAndric closed 9 years ago
Hi Andrea, thanks for the patch! It fixes both our original test case from FreeBSD, my initial reduced test case, and your bugpoint reduced test case.
I will import this into FreeBSD pretty soon.
I am glad that this patch works for you. I am going to resolve this bug.
Cheers, Andrea
Hi Andrea, thanks for the patch! It fixes both our original test case from FreeBSD, my initial reduced test case, and your bugpoint reduced test case.
I will import this into FreeBSD pretty soon.
Fixed at revision 250085. http://llvm.org/viewvc/llvm-project?view=revision&revision=250085
Hey Dimitry.
I created http://reviews.llvm.org/D13660. Could you please check if the new patch still fixes the original problem?
Many thanks! :-) Andrea
Would it be a problem if I send a patch for review on monday (if it is not too late for you)? Here is 21:43 now, and my wife would kill me if I work on weekends :-)
There is no hurry, all the earlier detective work seems to have proven that this is a rather tricky issue, so it is better to think it through calmly and without pressure.
Besides, I would not want to be guilty of causing divorces. ;) Please enjoy your weekend!
I'll see if I can create a proper patch that addresses all the problematic cases. The current patch is only addressing one particular case and it requires more improvements. Would it be a problem if I send a patch for review on monday (if it is not too late for you)? Here is 21:43 now, and my wife would kill me if I work on weekends :-)
Here is a patch that addresses the problem with type legalized SETCC nodes with operands and destination of different types.
The patch is just a hack/work in progress and only addresses a very specific case.
Thanks a lot! I can confirm that this fixes both the original test case (from FreeBSD's ieee802_11_common.c), my .c reduced test case, and your .ll test case.
Maybe this should be put up into a Phabricator review?
prototype patch Here is a patch that addresses the problem with type legalized SETCC nodes with operands and destination of different types.
The patch is just a hack/work in progress and only addresses a very specific case.
I think I understand now what's going on.
The problem seems to be caused by a wrong lowering of SETCC nodes in the X86 backend. The reproducible exposed a problem which has probably been there since ages..
this is another (even smaller) reproducible:
define <8 x i16> @test(<8 x i32> %a) { entry: %0 = trunc <8 x i32> %a to <8 x i32> %1 = icmp eq <8 x i23> %0, zeroinitializer %2 = or <8 x i1> %1, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false> %3 = zext <8 x i1> %2 to <8 x i16> ret <8 x i16> %3 }
Here, type <8 x i23> is not a legal type for the target. Legal vector types for a corei7-avx would be <8 x i16> and <8 x i32>.
instruction: %1 = icmp eq <8 x i23> %0, zeroinitializer
is lowered in SelectionDAG as: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly promote the return type of the setcc node from <8 x i1> to <8 x i16>. It would then promote the setcc operand type from <8 x i23> to <8 x i32> since type <8 x i32> is illegal for the target.
So, we end up with a v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
There are two problems here: 1) It looks like function LowerVSETCC works under the assumption that both operand type and result type are the same. 2) Because of point 1. function LowerVSETCC ends up expanding the 'setcc' instruction into this:
t37: v8i16 = X86ISD::PCMPEQ t36, t25 t36: v8i32 = bitcast t35 ... t25: v8i32 = build_vector of all zeroes.
The PCMPEQ here is wrong because SSE/AVX PCMPEQ expects operands and return type to be of the same type. Interestingly, we end up matching instruction VPCMPEQWrr. So we obtain something like: t37: v8i16 = X86ISD::VPCMPEQWrr v8i32 t36, v8i32 t25
However, t36 and t25 are VR256 and not VR128.
This inconsistency eventually leads to the insertion of COPY instructions like these:
%vreg7
Which are not valid since source and destination are of different size, and this is not a sub register copy.
I have attached a bugpoint reduced reproducible.
Thanks!
I can use that reproducible to obtain the same llc crash ('Cannot emit physreg copy instruction'. UNREACHABLE executed at X86InstrInfo.cpp:3967!). ... So, it looks like the problem may be between those two revisions (232000, 233000]. To clarify, I don't think revision 228923 is doing anything wrong in this context.
Okay, so bisecting between those, using your bugpoint reduced test case, leads to r232879 ("Cache the Function dependent subtarget on the MachineFunction") by Eric Christopher.
This commit has the comment: "As preparation for removing the getSubtargetImpl() call from TargetMachine go ahead and flip the switch on caching the function dependent subtarget and remove the bare getSubtargetImpl call from the X86 port."
It also looks pretty innocuous, so again the question is what deeper problem is exposed by it? Eric, do you have any idea?
Minimal reproducible obtained with bugpoint I have attached a bugpoint reduced reproducible.
I can use that reproducible to obtain the same llc crash ('Cannot emit physreg copy instruction'. UNREACHABLE executed at X86InstrInfo.cpp:3967!).
Also, I am able to reproduce the crash even if I revert revision 228923.
To me, that makes sense because r228923 cannot have caused this problem. What probably happened is that r228923 might have uncovered a latent bug.
Revision 228923 teaches how to compute the cost of a truncate/zext based on the target information. The TTI cost model now queries TLI (method isTruncateFree and method isZExtFree) to see if a zext/trunc is free for the target.
A change in that cost model would affect the loop vectorizer (and other passes that use the cost model). However, it shouldn't affect llc. Also, r228923 doesn't modify isTruncateFree/isZExtFree; therefore it cannot be the cause of the llc crash (and the IR generated by opt is valid).
The minimal reproducible builds fine (i.e. no crash) using llc at revision r232000. As soon as I upgrade to r233000, llc starts crashing due to UNREACHABLE code executed.
So, it looks like the problem may be between those two revisions (232000, 233000]. To clarify, I don't think revision 228923 is doing anything wrong in this context.
I started looking at this. I will post my findings soon.
When always enabling the loop-rotation, Further bisection shows that this was introduced in r228923 ("[TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target") by Andrea Di Biagio.
Andrea, do you have any idea how changes in the cost heuristics could lead to an error like this?
I've bisected, and this error started at r231820 ("Enable loop-rotate before loop-vectorize by default") by Michael Zolotukhin. However, the error must have been in there before, and the default enabling of loop rotation just exposed it...
Indeed, it looks related to AVX. Instead of specifying sandybridge as the target CPU, this also leads to the error:
clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -target-cpu x86-64 -target-feature +avx -O2 -vectorize-loops testcase.c
The backtrace goes as follows:
Cannot emit physreg copy instruction UNREACHABLE executed at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3967! [New Thread 808215000 (LWP 100175)]
Program received signal SIGABRT, Aborted. [Switching to Thread 808215000 (LWP 100175)] 0x0000000807888c0a in thr_kill () from /lib/libc.so.7 (gdb) bt
(gdb) frame 4
at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3967
3967 llvm_unreachable("Cannot emit physreg copy instruction");
(gdb) print RI
$1 = {
173 is YMM15, apparently, while 133 is XMM7. So this seems to be caused by AVX?
The target CPU must apparently be sandybridge or higher, the error does not reproduce with 'plain' x86_64.
With AMD processors, bdver1 and higher shows the same problem. amdfam10 and lower is fine.
assigned to @adibiagio
Extended Description
This reproduced with trunk r248374. Minimized test case:
char x0; *x1; x2(x3) { return x3 == 0 || x3 == 11 || x3 == 22; } x4() { int x5, x6, x7; for (; x1 && x7 < x0; x7++) if (x2(x1[x7])) x5++; else x6++; return x5 && x6; }
Compile with:
clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -target-cpu sandybridge -O2 -vectorize-loops testcase.c
Resulting in:
Cannot emit physreg copy instruction UNREACHABLE executed at /share/dim/src/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:3963! Abort trap
The target CPU must apparently be sandybridge or higher, the error does not reproduce with 'plain' x86_64.