Miscompilation on signed integer argument passing

tpearson-ssc commented 3 weeks ago

On all Clang versions after Clang 16, llc miscompiles a simple C++ test program testing integer argument passing:

#include <stdio.h>
#include <math.h>

int main() {
        volatile double result;

        volatile int exp = -53;
        volatile double base = 3635833917682146;

        result = ldexp(base, exp);
        printf("Result: %f\n", result);

        return 0;
}

The program should output 0.403659, but on Clang 17 and newer outputs inf.

On calling ldexp, r4 is used to pass exp. Unfortunately, only the lower 32 bits of exp are passed in r4, leading to an incorrect result. This is due to incorrect backend code generation:

  8bc:  60 00 3f c8     lfd     f1,96(r31)
- 8c0:  6e 00 9f e8     lwa     r4,108(r31)     <-- CORRECT (sign extends values for signed 64-bit integer parameter passing in r4)
+ 8c0:  6c 00 9f 80     lwz     r4,108(r31)     <-- WRONG (zeroes upper 32 bits of parameter in r4)
  8c4:  9d fd ff 4b     bl      660 <0000001b.plt_call.ldexp@@GLIBC_2.17>
  8c8:  18 00 41 e8     ld      r2,24(r1)
  8cc:  70 00 3f d8     stfd    f1,112(r31)

llvmbot commented 3 weeks ago

@llvm/issue-subscribers-backend-powerpc

Author: Timothy Pearson (tpearson-ssc)

On all Clang versions after Clang 16, `llc` miscompiles a simple C++ test program testing integer argument passing: ``` #include <stdio.h> #include <math.h> int main() { volatile double result; volatile int exp = -53; volatile double base = 3635833917682146; result = ldexp(base, exp); printf("Result: %f\n", result); return 0; } ``` The program should output `0.403659`, but on Clang 17 and newer outputs `inf`. On calling `ldexp`, `r4` is used to pass `exp`. Unfortunately, only the lower 32 bits of `exp` are passed in `r4`, leading to an incorrect result. This is due to incorrect backend code generation: ``` 8bc: 60 00 3f c8 lfd f1,96(r31) - 8c0: 6e 00 9f e8 lwa r4,108(r31) <-- CORRECT (sign extends values for signed 64-bit integer parameter passing in r4) + 8c0: 6c 00 9f 80 lwz r4,108(r31) <-- WRONG (zeroes upper 32 bits of parameter in r4) 8c4: 9d fd ff 4b bl 660 <0000001b.plt_call.ldexp@@GLIBC_2.17> 8c8: 18 00 41 e8 ld r2,24(r1) 8cc: 70 00 3f d8 stfd f1,112(r31) ```

tpearson-ssc commented 3 weeks ago

A copy of the working .bc/.ll file is attached. Compiling this file with llc-16 creates a working object file, while llc-17 outputs the miscompilation instead.

Note that the intermediate output appears to correctly call for sign extension on the affected function argument: %6 = tail call double @ldexp(double noundef %4, i32 noundef signext %5) #7, !dbg !157 output.zip

tpearson-ssc commented 3 weeks ago

Looks like this is a side effect of using -fno-math-errno on Clang 17 and up.

Godbolt: https://godbolt.org/z/1hcxabMWb

tpearson-ssc commented 3 weeks ago

I still maintain this is a backend llc problem, but for debugging purposes here is the difference in the ll IR between functional on trunk and non-functional:

@@ -23,15 +23,15 @@
   ret i32 0
 }

-; Function Attrs: nounwind
+; Function Attrs: nounwind willreturn memory(none)
 declare double @ldexp(double noundef, i32 noundef signext) #1

 declare signext i32 @printf(ptr noundef, ...) #2

 attributes #0 = { mustprogress noinline norecurse optnone uwtable "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
-attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
+attributes #1 = { nounwind willreturn memory(none) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
 attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
-attributes #3 = { nounwind }
+attributes #3 = { nounwind willreturn memory(none) }

 !llvm.module.flags = !{!0, !1, !2, !3, !4, !5}
 !llvm.ident = !{!6}

Both versions compile correctly on Clang 16.

tpearson-ssc commented 3 weeks ago

A little more information, as I've been intrigued by the bug and it's causing critical issues on a security sensitive program (Chromium browser):

A comparison of the good / bad instruction selection dump shows a change from LWA to LWZ8, yet the reason for instruction selection remains the same:

@@ -18,11 +18,11 @@
   %13:g8rc = ORI8 killed %12:g8rc, 25540
   STD killed %13:g8rc, 0, %stack.3 :: (volatile store (s64) into %ir.4)
   %14:f8rc = LFD 0, %stack.3 :: (volatile dereferenceable load (s64) from %ir.4)
-  %15:g8rc = LWA 0, %stack.2 :: (volatile dereferenceable load (s32) from %ir.3)
+  %15:g8rc = LWZ8 0, %stack.2 :: (volatile dereferenceable load (s32) from %ir.3)
   ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1
   $f1 = COPY %14:f8rc
   $x4 = COPY %15:g8rc
-  BL8_NOP @ldexp, <regmask $zero $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 and 62 more...>, implicit-def dead $lr8, implicit $rm, implicit $f1, implicit $x4, implicit $x2, implicit-def $r1, implicit-def $f1
+  BL8_NOP &ldexp, <regmask $zero $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 and 62 more...>, implicit-def dead $lr8, implicit $rm, implicit $f1, implicit $x4, implicit $x2, implicit-def $r1, implicit-def $f1
   ADJCALLSTACKUP 32, 0, implicit-def dead $r1, implicit $r1
   %16:f8rc = COPY $f1
   STFD %16:f8rc, 0, %stack.1 :: (volatile store (s64) into %ir.2)

nikic commented 3 weeks ago

Reduced test case (https://llvm.godbolt.org/z/ah98b7sMW):

define double @test(ptr %a, ptr %b) {
  %base = load double, ptr %a
  %exp = load i32, ptr %b
  %call = call double @llvm.ldexp.f64.i32(double %base, i32 signext %exp)
  ret double %call
}

The relevant difference is that ldexp is now lowered to an fldexp node in SDAG, and then presumably the libcall legalization doesn't properly handle targets that need sign extension for i32 values.

tpearson-ssc commented 3 weeks ago

@nikic Thanks, that got me going and I was able to isolate what I think is the actual problem. It looks like the legalization code blindly assumes all integer arguments to floating point libcalls are unsigned, specifically at https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#2142 .

As this affects all architectures, not just PowerPC, could we change the tags to match? It's architecture-dependent whether the incorrect zero extend vs. sign extend will translate to a miscompilation severe enough to actually cause a program effect, unfortunately ppc64el is one where you can't get away with this kind of mistake.

I put together a quick hack to test, and sure enough this fixes the problem for at least ldexp. I would imagine a more general solution is desirable; llvm must know somewhere what the library call arguments are defined as, and should be able to automatically handle setting the signed/unsigned flag.

index f5fbc01cd95e..74c610999bfe 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -2205,7 +2207,10 @@ void SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
     Results.push_back(Tmp.first);
     Results.push_back(Tmp.second);
   } else {
-    SDValue Tmp = ExpandLibCall(LC, Node, false).first;
+    bool isSignedArgument = false;
+    if (LC == RTLIB::LDEXP_F64)
+      isSignedArgument = true;
+    SDValue Tmp = ExpandLibCall(LC, Node, isSignedArgument).first;
     Results.push_back(Tmp);
   }
 }

nikic commented 3 weeks ago

I put together a quick hack to test, and sure enough this fixes the problem for at least ldexp. I would imagine a more general solution is desirable; llvm must know somewhere what the library call arguments are defined as, and should be able to automatically handle setting the signed/unsigned flag.

I don't think that knowledge exists anywhere, at least not in a form accessible in the backend. I'd suggest to submit a PR with your patch and see what people think. Do you know any other FP libcalls that have signed arguments?

tpearson-ssc commented 3 weeks ago

I don't think that knowledge exists anywhere, at least not in a form accessible in the backend. I'd suggest to submit a PR with your patch and see what people think. Do you know any other FP libcalls that have signed arguments?

Good to know, that saves me looking for something that doesn't exist. :wink:

I took a look at the list in include/llvm/IR/RuntimeLibcalls.def, and the only other one would be frexp. I'll get a PR together shortly.

tpearson-ssc commented 3 weeks ago

Pull request created. What I still don't fully understand is why amd64 / aarch64 aren't affected -- I'm wondering if there's a corollary bug where clang will blindly and incorrectly sign-extend everything on those architectures (instead of correctly zero-extending unsigned integers), but since there is no floating point libcall that currently takes an unsigned 32-bit integer I have no way to test.

tpearson-ssc commented 3 weeks ago

Turns out this does in fact hit x86 as well, it's just harder to trigger: https://github.com/tpearson-ssc/llvm-project/commit/b0b6370f61c961cba0319475b87f36b251b0d0c7#diff-32fec8aa60f2e3238f8260d5b661f837e5b61a6af3f41949c2b82ce72f1c3f27 llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll

nikic commented 2 weeks ago

/cherry-pick 90c14748638f1e10e31173b145fdbb5c4529c922

llvmbot commented 2 weeks ago

/pull-request llvm/llvm-project#109920

llvm / llvm-project

Miscompilation on signed integer argument passing #108904