Closed tpearson-ssc closed 2 weeks ago
@llvm/issue-subscribers-backend-powerpc
Author: Timothy Pearson (tpearson-ssc)
A copy of the working .bc
/.ll
file is attached. Compiling this file with llc-16
creates a working object file, while llc-17
outputs the miscompilation instead.
Note that the intermediate output appears to correctly call for sign extension on the affected function argument:
%6 = tail call double @ldexp(double noundef %4, i32 noundef signext %5) #7, !dbg !157
output.zip
Looks like this is a side effect of using -fno-math-errno
on Clang 17 and up.
Godbolt: https://godbolt.org/z/1hcxabMWb
I still maintain this is a backend llc
problem, but for debugging purposes here is the difference in the ll
IR between functional on trunk
and non-functional:
@@ -23,15 +23,15 @@
ret i32 0
}
-; Function Attrs: nounwind
+; Function Attrs: nounwind willreturn memory(none)
declare double @ldexp(double noundef, i32 noundef signext) #1
declare signext i32 @printf(ptr noundef, ...) #2
attributes #0 = { mustprogress noinline norecurse optnone uwtable "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
-attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
+attributes #1 = { nounwind willreturn memory(none) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="ppc64le" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+htm,+isa-v206-instructions,+isa-v207-instructions,+power8-vector,+vsx,-aix-small-local-exec-tls,-isa-v30-instructions,-power9-vector,-privileged,-quadword-atomics,-rop-protect,-spe" }
-attributes #3 = { nounwind }
+attributes #3 = { nounwind willreturn memory(none) }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5}
!llvm.ident = !{!6}
Both versions compile correctly on Clang 16.
A little more information, as I've been intrigued by the bug and it's causing critical issues on a security sensitive program (Chromium browser):
A comparison of the good / bad instruction selection dump shows a change from LWA
to LWZ8
, yet the reason for instruction selection remains the same:
@@ -18,11 +18,11 @@
%13:g8rc = ORI8 killed %12:g8rc, 25540
STD killed %13:g8rc, 0, %stack.3 :: (volatile store (s64) into %ir.4)
%14:f8rc = LFD 0, %stack.3 :: (volatile dereferenceable load (s64) from %ir.4)
- %15:g8rc = LWA 0, %stack.2 :: (volatile dereferenceable load (s32) from %ir.3)
+ %15:g8rc = LWZ8 0, %stack.2 :: (volatile dereferenceable load (s32) from %ir.3)
ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1
$f1 = COPY %14:f8rc
$x4 = COPY %15:g8rc
- BL8_NOP @ldexp, <regmask $zero $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 and 62 more...>, implicit-def dead $lr8, implicit $rm, implicit $f1, implicit $x4, implicit $x2, implicit-def $r1, implicit-def $f1
+ BL8_NOP &ldexp, <regmask $zero $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 and 62 more...>, implicit-def dead $lr8, implicit $rm, implicit $f1, implicit $x4, implicit $x2, implicit-def $r1, implicit-def $f1
ADJCALLSTACKUP 32, 0, implicit-def dead $r1, implicit $r1
%16:f8rc = COPY $f1
STFD %16:f8rc, 0, %stack.1 :: (volatile store (s64) into %ir.2)
Reduced test case (https://llvm.godbolt.org/z/ah98b7sMW):
define double @test(ptr %a, ptr %b) {
%base = load double, ptr %a
%exp = load i32, ptr %b
%call = call double @llvm.ldexp.f64.i32(double %base, i32 signext %exp)
ret double %call
}
The relevant difference is that ldexp is now lowered to an fldexp node in SDAG, and then presumably the libcall legalization doesn't properly handle targets that need sign extension for i32 values.
@nikic Thanks, that got me going and I was able to isolate what I think is the actual problem. It looks like the legalization code blindly assumes all integer arguments to floating point libcalls are unsigned, specifically at https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#2142 .
As this affects all architectures, not just PowerPC, could we change the tags to match? It's architecture-dependent whether the incorrect zero extend vs. sign extend will translate to a miscompilation severe enough to actually cause a program effect, unfortunately ppc64el
is one where you can't get away with this kind of mistake.
I put together a quick hack to test, and sure enough this fixes the problem for at least ldexp
. I would imagine a more general solution is desirable; llvm must know somewhere what the library call arguments are defined as, and should be able to automatically handle setting the signed/unsigned flag.
index f5fbc01cd95e..74c610999bfe 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -2205,7 +2207,10 @@ void SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
Results.push_back(Tmp.first);
Results.push_back(Tmp.second);
} else {
- SDValue Tmp = ExpandLibCall(LC, Node, false).first;
+ bool isSignedArgument = false;
+ if (LC == RTLIB::LDEXP_F64)
+ isSignedArgument = true;
+ SDValue Tmp = ExpandLibCall(LC, Node, isSignedArgument).first;
Results.push_back(Tmp);
}
}
I put together a quick hack to test, and sure enough this fixes the problem for at least
ldexp
. I would imagine a more general solution is desirable; llvm must know somewhere what the library call arguments are defined as, and should be able to automatically handle setting the signed/unsigned flag.
I don't think that knowledge exists anywhere, at least not in a form accessible in the backend. I'd suggest to submit a PR with your patch and see what people think. Do you know any other FP libcalls that have signed arguments?
I don't think that knowledge exists anywhere, at least not in a form accessible in the backend. I'd suggest to submit a PR with your patch and see what people think. Do you know any other FP libcalls that have signed arguments?
Good to know, that saves me looking for something that doesn't exist. :wink:
I took a look at the list in include/llvm/IR/RuntimeLibcalls.def
, and the only other one would be frexp
. I'll get a PR together shortly.
Pull request created. What I still don't fully understand is why amd64
/ aarch64
aren't affected -- I'm wondering if there's a corollary bug where clang will blindly and incorrectly sign-extend everything on those architectures (instead of correctly zero-extending unsigned integers), but since there is no floating point libcall that currently takes an unsigned 32-bit integer I have no way to test.
Turns out this does in fact hit x86 as well, it's just harder to trigger: https://github.com/tpearson-ssc/llvm-project/commit/b0b6370f61c961cba0319475b87f36b251b0d0c7#diff-32fec8aa60f2e3238f8260d5b661f837e5b61a6af3f41949c2b82ce72f1c3f27 llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll
/cherry-pick 90c14748638f1e10e31173b145fdbb5c4529c922
/pull-request llvm/llvm-project#109920
On all Clang versions after Clang 16,
llc
miscompiles a simple C++ test program testing integer argument passing:The program should output
0.403659
, but on Clang 17 and newer outputsinf
.On calling
ldexp
,r4
is used to passexp
. Unfortunately, only the lower 32 bits ofexp
are passed inr4
, leading to an incorrect result. This is due to incorrect backend code generation: