llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

[x86, SSE] high elements of vectors are not ignored as expected #24897

Open rotateright opened 9 years ago

rotateright commented 9 years ago
Bugzilla Link 24523
Version trunk
OS All
CC @topperc,@RKSimon

Extended Description

Noticed in bug 24475: we're not eliminating meaningless ops on the higher elements of vectors.

$ cat isnan.ll

define float @my_sse_isnan(float %f1) {
  %v1 = insertelement <4 x float> undef, float %f1, i32 0

  ; this shouldn't affect anything; we only care about the low (scalar) element
  %v2 = insertelement <4 x float> %v1, float 0.0, i32 1  

  %cmpss = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %v2, <4 x float> %v2, i8 3)
  %ext = extractelement <4 x float> %cmpss, i32 0
  ret float %ext
}

$ ./llc -o - isnan.ll

...
  xorps %xmm1, %xmm1
  movss %xmm0, %xmm1            ## xmm1 = xmm0[0],xmm1[1,2,3]
  cmpunordss    %xmm1, %xmm1
  movaps    %xmm1, %xmm0
  retq

I was expecting that to be reduced to:

  cmpunordss %xmm0, %xmm0
  retq
RKSimon commented 4 years ago

4 years later...

Candidate patch: https://reviews.llvm.org/D76928

RKSimon commented 8 years ago

D17490 begins adding cmpss/cmpsd support to SimplifyDemandedVectorElts

RKSimon commented 8 years ago

I looked at adding this to the SSE scalar intrinsics handling in SimplifyDemandedVectorElts, for the ord/unord comparison cases we can add support quite easily - the other cases would need more care due to SSE's quirky NAN handling.

With a local patch, opt can generate this:

define float @my_sse_isnan(float %f1) {
  %1 = fcmp uno float %f1, 0.0
  %2 = sext i1 %1 to i32
  %3 = bitcast i32 %2 to float
  ret float %3
}

which results in the assembly:

    xorl    %eax, %eax
    ucomiss %xmm0, %xmm0
    movl    $-1, %ecx
    cmovnpl %eax, %ecx
    movd    %ecx, %xmm0
    retq

Which is another can of worms....

Part of the problem is the fact that the original test reuses the same variable in both arguments of the comparison, which stops SimplifyDemandedVectorElts doing almost anything - we could fix this part of the bug very easily for cases with different arguments.

I'm not sure which would be less work - adding basic support to SimplifyDemandedVectorElts for values reused in an instruction or better ucomiss/cmpss selection.