llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.14k forks source link

Finite only math doesn't strip out most of body of powr implementation #64870

Open arsenm opened 1 year ago

arsenm commented 1 year ago

With sufficient fast math flags passed to an implementation of OpenCL's powr, the edge case infinite value handling is not pruned out. This should have adequate information to delete everything except the first 4 instructions in this function from the nofpclass(nan inf) attributes on the arguments and return value

define hidden noundef nofpclass(nan inf nzero nsub nnorm) float @test_powr(float noundef nofpclass(nan inf) %x, float noundef nofpclass(nan inf) %y) #0 {
entry:
  %i = tail call float @llvm.fabs.f32(float noundef %x)
  %i1 = tail call float @llvm.log2.f32(float noundef %i)
  %i2 = fmul float %i1, %y
  %i3 = tail call noundef nofpclass(ninf nzero nsub nnorm) float @llvm.exp2.f32(float noundef %i2)
  %i4 = fcmp olt float %y, 0.000000e+00
  %i5 = select i1 %i4, float 0x7FF0000000000000, float 0.000000e+00
  %i6 = fcmp oeq float %x, 0.000000e+00
  %i7 = select i1 %i6, float %i5, float %i3
  %i8 = fcmp oeq float %y, 0.000000e+00
  %i9 = select i1 %i6, float 0x7FF8000000000000, float 1.000000e+00
  %i10 = select i1 %i8, float %i9, float %i7
  %i11 = fcmp oeq float %x, 1.000000e+00
  %i12 = select i1 %i11, float 1.000000e+00, float %i10
  %i13 = fcmp olt float %x, 0.000000e+00
  %i14 = select i1 %i13, float 0x7FF8000000000000, float %i12
  ret float %i14
}

declare float @llvm.fabs.f32(float) #1
declare float @llvm.log2.f32(float) #1
declare float @llvm.exp2.f32(float) #1
declare float @llvm.trunc.f32(float) #1
declare float @llvm.copysign.f32(float, float) #1

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

I think this requires a "simplify demanded fpclass" type of handling, similar to SimplifyDemandedBits

arsenm commented 1 year ago

https://reviews.llvm.org/D158648 half fixes it, there's still some 0/1 stuff left over after the inf and nan cases are removed