rust-lang / compiler-builtins

Porting `compiler-rt` intrinsics to Rust
https://github.com/rust-lang/rust/issues/35437
Other
361 stars 208 forks source link

Infinite recursion in `__extendhfsf2` and `__truncsfhf2` on "no-f16-f128" platforms #651

Open liushuyu opened 2 months ago

liushuyu commented 2 months ago

I have encountered an infinite recursion when doing f16 arithmetics on "no-f16-f128" platforms due to questionable LLVM optimizations.

My initial investigation shows this might be due to LLVM being too clever and optimizing the soft-float absolute value conversion to a hard-float one:

define weak hidden noundef float @__extendhfsf2(half noundef %a) unnamed_addr #9 {
start:                                                                                                                                                                                       
%0 = tail call half @llvm.fabs.f16(half %a)                                                                                                                                                  
%_0.i7.i.i = bitcast half %0 to i16
%_0.i8.i.i = add nsw i16 %_0.i7.i.i, -1024
%_0.i5.i.i = icmp ult i16 %_0.i8.i.i, 30720
br i1 %_0.i5.i.i, label %bb8.i.i, label %bb14.i.i

; <snip...>
}

When lowering to the machine code, this causes the intrinsic function to call either itself or __truncsfhf2, and then __truncsfhf2 will call __extendhfsf2 again, forming an infinite recursion.

My current idea is to:

diff --git a/src/float/extend.rs b/src/float/extend.rs
index 5560489..10a0d61 100644
--- a/src/float/extend.rs
+++ b/src/float/extend.rs
@@ -32,7 +32,7 @@ where

     let sign_bits_delta = dst_sign_bits - src_sign_bits;
     let exp_bias_delta = dst_exp_bias - src_exp_bias;
-    let a_abs = a.repr() & src_abs_mask;
+    let a_abs = core::hint::black_box(a.repr()) & src_abs_mask;
     let mut abs_result = R::Int::ZERO;

     if a_abs.wrapping_sub(src_min_normal) < src_infinity.wrapping_sub(src_min_normal) {

... which will try to prevent LLVM from merging the absolute value masking into the @llvm.fabs.f16 LLVM intrinsic. However, this idea introduces two extra memory operations (storing f16 and reading i16).

I did not open a pull request because I hope someone could come up with a much better idea.

beetrees commented 2 months ago

Which target(s) have you encountered this on?

beetrees commented 2 months ago

More specifically, I suspect this might be being caused by llvm/llvm-project#97981 (on several architectures LLVM currently passes f16s as f32s).

liushuyu commented 2 months ago

Which target(s) have you encountered this on?

I tried this on powerpc64le-unknown-linux-gnu

tgross35 commented 2 months ago

I have encountered an infinite recursion when doing f16 arithmetics on "no-f16-f128" platforms due to questionable LLVM optimizations.

Did you discover this by trying to build compiler_builtins, or just by compiling a sample program in Rust? I don't think the latter is currently possible on ppc but just want to check.

liushuyu commented 2 months ago

I have encountered an infinite recursion when doing f16 arithmetics on "no-f16-f128" platforms due to questionable LLVM optimizations.

Did you discover this by trying to build compiler_builtins, or just by compiling a sample program in Rust? I don't think the latter is currently possible on ppc but just want to check.

I tried to build a sample program using the nightly Rust but with an updated version of compiler_builtins and LLVM 19

tgross35 commented 2 months ago

Hm, I thought these symbols weren't available from Rust's compiler_builtins because of here https://github.com/rust-lang/rust/blob/53676730146e38e4697b6204c2ee61a9fd6b7e51/library/alloc/Cargo.toml#L15-L16