Closed SahilPatidar closed 4 months ago
@dtcxzyw, please check this out before making it ready for review.
I finished some dirty work. Does it look alright?
@jayfoad, how can I check for denormals being flushed?
@jayfoad, how can I check for denormals being flushed?
See earlier comments. @arsenm said:
F.getDenormalType(fltSemantics) == DenormalMode::getIEEE() works, but is pretty ugly
@jayfoad, I was looking for F.getDenormalType but couldn't find it. Instead, I tried using F->getDenormalMode(LHSI->getType()->getScalarType()->getFltSemantics()) == DenormalMode::getIEEE(), but I haven't seen any difference in the tests after making the change.
@jayfoad, I was looking for F.getDenormalType but couldn't find it. Instead, I tried using F->getDenormalMode(LHSI->getType()->getScalarType()->getFltSemantics()) == DenormalMode::getIEEE(), but I haven't seen any difference in the tests after making the change.
You would need to take a test where your optimization does apply and then make a copy of it with an attribute like this, and show that the optimization does not apply to it:
define float @foo(...) "denormal-fp-math"="dynamic,dynamic" {
...
}
See the LangRef for the allowed values of the denormal-fp-math
attribute. The default is ieee,ieee
and if you specify anything else it should block your optimization.
@jayfoad, I tried setting the denormal-fp-math
attribute to various options (positive-zero
, dynamic,dynamic
, and preserve-sign
) as you suggested. it seems to be preventing optimizations as you mentioned.
@jayfoad, I tried setting the
denormal-fp-math
attribute to various options (positive-zero
,dynamic,dynamic
, andpreserve-sign
) as you suggested. it seems to be preventing optimizations as you mentioned.
Need to add these in some test functions
@arsenm Is there any further comment?
LGTM. I see "This pull request is still a work in progress Draft pull requests cannot be merged."
LGTM with some nits, also has some conflicts. Should be converted to a not-draft PR
@llvm/pr-subscribers-llvm-transforms
Author: None (SahilPatidar)
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
Transforms/PhaseOrdering/X86/vector-reductions.ll fails precheck. It does have some fcmp + fsub patterns, so likely a real failure
After updating the test, it is breaking tail call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP2]])
.
define i32 @TestVectorsEqualFP(ptr noalias %Vec0, ptr noalias %Vec1, float %Tole
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[VEC0:%.*]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[VEC1:%.*]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = fsub fast <4 x float> [[TMP0]], [[TMP1]]
-; CHECK-NEXT: [[TMP3:%.*]] = tail call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP2]])
-; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP3]])
-; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ole float [[TMP4]], [[TOLERANCE:%.*]]
+; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast oge <4 x float> [[TMP0]], [[TMP1]]
+; CHECK-NEXT: [[TMP4:%.*]] = fneg fast <4 x float> [[TMP2]]
+; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP3]], <4 x float> [[TMP2]], <4 x float> [[TMP4]]
+; CHECK-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])
+; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ole float [[TMP6]], [[TOLERANCE:%.*]]
; CHECK-NEXT: [[COND5:%.*]] = zext i1 [[CMP4]] to i32
; CHECK-NEXT: ret i32 [[COND5]]
;
After updating the test, it is breaking
tail call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP2]])
.
This regression looks like it's because you are missing a hasOneUse check on the fsub. You should add some multi-use negative tests
@jayfoad
@SahilPatidar Could you please update alive2 link in the PR description?
@SahilPatidar Could you please update alive2 link in the PR description?
Apologies for the delay; please take a look at the link.
Resolve #85245 Alive2: https://alive2.llvm.org/ce/z/F4rDwK