llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.09k stars 11.6k forks source link

[bug][inline] compile with optimize option -O1 will get wrong result #64479

Closed CoTinker closed 1 year ago

CoTinker commented 1 year ago

demo.c

#include "arm_neon.h"
#include <stdio.h>

__attribute__((noinline))
void test()
{
    float32_t a1[] = {-0.0, -0.0, 0.0, 0.0};
    float32_t b1[] = {0.0, 0.0, -0.0, -0.0};

    float32x4_t a = vld1q_f32 (a1);
    float32x4_t b = vld1q_f32 (b1);
    float32x4_t c = vmaxnmq_f32 (a, b);
    float32_t actual1[4];
    vst1q_f32 (actual1, c);
    for (int i = 0; i < 4; ++i) 
        printf("%f\n", actual1[i]);
}
int main ()
{
    test();
    return 0;
}

compile command

clang demo.c -static -fno-caret-diagnostics -march=armv8-a -mfloat-abi=softfp -o demo

right output

0.000000
0.000000
0.000000
0.000000

but with the optimize option -O1 -O2 -O3 etc. the output will be wrong:

-0.000000
-0.000000
0.000000
0.000000

I guess it's the noinline command __attribute__((noinline)) not work: add the compile option -mllvm -opt-bisect-limit=56 the output is right, but with the option -mllvm -opt-bisect-limit=57,the output is wrong. my clang version: 15.0.4

opt pass

BISECT: running pass (1) Annotation2MetadataPass on [module]
BISECT: running pass (2) ForceFunctionAttrsPass on [module]
BISECT: running pass (3) InferFunctionAttrsPass on [module]
BISECT: running pass (4) LowerExpectIntrinsicPass on test
BISECT: running pass (5) SimplifyCFGPass on test
BISECT: running pass (6) SROAPass on test
BISECT: running pass (7) EarlyCSEPass on test
BISECT: running pass (8) LowerExpectIntrinsicPass on vmaxnmq_f32
BISECT: running pass (9) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (10) SROAPass on vmaxnmq_f32
BISECT: running pass (11) EarlyCSEPass on vmaxnmq_f32
BISECT: running pass (12) LowerExpectIntrinsicPass on main
BISECT: running pass (13) SimplifyCFGPass on main
BISECT: running pass (14) SROAPass on main
BISECT: running pass (15) EarlyCSEPass on main
BISECT: running pass (16) OpenMPOptPass on [module]
BISECT: running pass (17) IPSCCPPass on [module]
BISECT: running pass (18) CalledValuePropagationPass on [module]
BISECT: running pass (19) GlobalOptPass on [module]
BISECT: running pass (20) PromotePass on test
BISECT: running pass (21) PromotePass on vmaxnmq_f32
BISECT: running pass (22) PromotePass on main
BISECT: running pass (23) DeadArgumentEliminationPass on [module]
BISECT: running pass (24) InstCombinePass on test
BISECT: running pass (25) SimplifyCFGPass on test
BISECT: running pass (26) InstCombinePass on vmaxnmq_f32
BISECT: running pass (27) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (28) InstCombinePass on main
BISECT: running pass (29) SimplifyCFGPass on main
BISECT: running pass (30) InvalidateAnalysisPass<llvm::AAManager> on test
BISECT: running pass (31) InvalidateAnalysisPass<llvm::AAManager> on vmaxnmq_f32
BISECT: running pass (32) InvalidateAnalysisPass<llvm::AAManager> on main
BISECT: running pass (33) InlinerPass on (vmaxnmq_f32)
BISECT: running pass (34) InlinerPass on (vmaxnmq_f32)
BISECT: running pass (35) PostOrderFunctionAttrsPass on (vmaxnmq_f32)
BISECT: running pass (36) SROAPass on vmaxnmq_f32
BISECT: running pass (37) EarlyCSEPass on vmaxnmq_f32
BISECT: running pass (38) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (39) InstCombinePass on vmaxnmq_f32
BISECT: running pass (40) LibCallsShrinkWrapPass on vmaxnmq_f32
BISECT: running pass (41) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (42) ReassociatePass on vmaxnmq_f32
BISECT: running pass (43) LoopSimplifyPass on vmaxnmq_f32
BISECT: running pass (44) LCSSAPass on vmaxnmq_f32
BISECT: running pass (45) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (46) InstCombinePass on vmaxnmq_f32
BISECT: running pass (47) LoopSimplifyPass on vmaxnmq_f32
BISECT: running pass (48) LCSSAPass on vmaxnmq_f32
BISECT: running pass (49) SROAPass on vmaxnmq_f32
BISECT: running pass (50) MemCpyOptPass on vmaxnmq_f32
BISECT: running pass (51) SCCPPass on vmaxnmq_f32
BISECT: running pass (52) BDCEPass on vmaxnmq_f32
BISECT: running pass (53) InstCombinePass on vmaxnmq_f32
BISECT: running pass (54) ADCEPass on vmaxnmq_f32
BISECT: running pass (55) SimplifyCFGPass on vmaxnmq_f32
BISECT: running pass (56) InstCombinePass on vmaxnmq_f32
BISECT: NOT running pass (57) InlinerPass on (test)
BISECT: NOT running pass (58) InlinerPass on (test)
topperc commented 1 year ago

I think the issue is that AArch64ISelLowering.cpp converts the maxnm intrinsic to ISD::FMAXNUM

  case Intrinsic::aarch64_neon_fmaxnm:                                           
    return DAG.getNode(ISD::FMAXNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));                      
  case Intrinsic::aarch64_neon_fminnm:                                           
    return DAG.getNode(ISD::FMINNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));

ISD::FMAXNUM intrinsic is constant folded using this code from APFloat.h

inline APFloat maxnum(const APFloat &A, const APFloat &B) {                      
  if (A.isNaN())                                                                 
    return B;                                                                    
  if (B.isNaN())                                                                 
    return A;                                                                    
  return A < B ? B : A;                                                          
}

This does not order -0.0 to be less than 0.0.

The implementation of maxnum matches the documentation for llvm.maxnum here https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic

Based on that I think it was incorrect for AArch64 to convert to ISD::FMAXNUM.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

CoTinker commented 1 year ago

I think the issue is that AArch64ISelLowering.cpp converts the maxnm intrinsic to ISD::FMAXNUM

  case Intrinsic::aarch64_neon_fmaxnm:                                           
    return DAG.getNode(ISD::FMAXNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));                      
  case Intrinsic::aarch64_neon_fminnm:                                           
    return DAG.getNode(ISD::FMINNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));

ISD::FMAXNUM intrinsic is constant folded using this code from APFloat.h

inline APFloat maxnum(const APFloat &A, const APFloat &B) {                      
  if (A.isNaN())                                                                 
    return B;                                                                    
  if (B.isNaN())                                                                 
    return A;                                                                    
  return A < B ? B : A;                                                          
}

This does not order -0.0 to be less than 0.0.

The implementation of maxnum matches the documentation for llvm.maxnum here https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic

Based on that I think it was incorrect for AArch64 to convert to ISD::FMAXNUM. thank you very much!

CoTinker commented 1 year ago

I think the issue is that AArch64ISelLowering.cpp converts the maxnm intrinsic to ISD::FMAXNUM

  case Intrinsic::aarch64_neon_fmaxnm:                                           
    return DAG.getNode(ISD::FMAXNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));                      
  case Intrinsic::aarch64_neon_fminnm:                                           
    return DAG.getNode(ISD::FMINNUM, SDLoc(N), N->getValueType(0),               
                       N->getOperand(1), N->getOperand(2));

ISD::FMAXNUM intrinsic is constant folded using this code from APFloat.h

inline APFloat maxnum(const APFloat &A, const APFloat &B) {                      
  if (A.isNaN())                                                                 
    return B;                                                                    
  if (B.isNaN())                                                                 
    return A;                                                                    
  return A < B ? B : A;                                                          
}

This does not order -0.0 to be less than 0.0. The implementation of maxnum matches the documentation for llvm.maxnum here https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic Based on that I think it was incorrect for AArch64 to convert to ISD::FMAXNUM. thank you very much!

but why maxnum return right result with option -O0

topperc commented 1 year ago

My best guess is that -O0 worked because it didn't get constant folded. So the ISD::MAXNUM was turned into a the maxnm instruction which did the right thing. It's legal to convert ISD::MAXNUM to the maxnm instruction but not the other way around.