Closed llvmbot closed 15 years ago
Sure, but please ask on llvmdev, thanks!
Great, thanks for fixing this!
I'd love to give it a try to add support for matching MMX and SSE instructions (I'm quite experienced with them actually). I'm just not very familiar with LLVM's code, but because you say it's straightforward I'm eager to learn more about it. Got any advice on how to proceed? Thanks again.
Crash fixed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090316/075388.html
Please include detailed information on how to reproduce the problem when you file a bug (e.g. saying to run instcombine on it). I don't think that the X86 backend has full support for vector shift matching. This is "documented" in lib/Target/X86/README-UNIMPLEMENTED.txt. If you're interested, it should be straight-forward to add.
Also, the output looks like this (when using lshr and a bunch of standard optimizations):
sub esp,44h movq mm0,mmword ptr ds:[3BC0010h] movq mmword ptr [esp],mm0 mov eax,dword ptr [esp+48h] movq mm1,mmword ptr [eax] movq mmword ptr [esp+8],mm1 movq mmword ptr [esp+10h],mm0 movq mmword ptr [esp+18h],mm1 movq mmword ptr [esp+20h],mm0 movq mmword ptr [esp+28h],mm1 movq mmword ptr [esp+30h],mm0 movq mmword ptr [esp+38h],mm1 mov cl,byte ptr [esp+6] mov dx,word ptr [esp+0Eh] shr dx,cl movd mm0,edx mov cl,byte ptr [esp+12h] mov dx,word ptr [esp+1Ah] shr dx,cl movd mm1,edx punpcklwd mm1,mm0 mov cl,byte ptr [esp+24h] mov dx,word ptr [esp+2Ch] shr dx,cl movd mm0,edx mov cl,byte ptr [esp+30h] mov dx,word ptr [esp+38h] shr dx,cl movd mm2,edx punpcklwd mm2,mm0 punpcklwd mm2,mm1 movq mmword ptr [eax],mm2 add esp,44h ret
Instead of something more like this:
mov eax, [esp+4] movq mm0, [eax] psrlw mm0, 5 movq [eax], mm0 ret
Fibonacci Sorry, that should be ashr instead of lshr. I've also attached a replacement of fibonacci.cpp that should reproduce the assert.
How do I reproduce this? Both of these work fine: $ llvm-as < t.ll | opt -instcombine $ llvm-as < t.ll | opt -std-compile-opts
Extended Description
The IR code below asserts in ValueTracking.cpp:566 (the first line of llvm::ComputeNumSignBits) when using an instruction combining optimization pass:
define internal void @0(<4 x i16>, <4 x i16>) { %3 = alloca <4 x i16>* ; <<4 x i16>> [#uses=2] %4 = alloca <4 x i16>* ; <<4 x i16>*> [#uses=2] store <4 x i16> %0, <4 x i16> %4 store <4 x i16> %1, <4 x i16> %3 %5 = load <4 x i16> %3 ; <<4 x i16>> [#uses=1] %6 = load <4 x i16>* %5, align 1 ; <<4 x i16>> [#uses=1] %7 = lshr <4 x i16> %6, <i16 5, i16 5, i16 5, i16 5> ; <<4 x i16>> [#uses=1] %8 = load <4 x i16>* %4 ; <<4 x i16>> [#uses=1] store <4 x i16> %7, <4 x i16>* %8, align 1 ret void }
The assert is about a cast to an incompatible type; as far as I can tell the function doesn't expect a vector as the second argument of lshr.
Furthermore, this generates very inefficient scalar code with four x86 shr instrutions, while it's supposed to be a perfect match for the MMX psrlw instruction. I suspect it doesn't work for any MMX/SSE vector shift.