google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.12k stars 315 forks source link

Made fixes to RVV Ceil, Floor, and F->F DemoteTo #2164

Closed johnplatts closed 4 months ago

johnplatts commented 4 months ago

Resolves issue #2140.

Updated Ceil and Floor on RVV to use detail::CeilInt and detail::FloorInt (which are wrappers around the RVV __riscv_vfcvt_x_f_v_i*_rm intrinsic) with GCC 14 and later and Clang 17 to fix reordering bugs with changing floating-point rounding mode using inline assembly with Clang 16 and later.

Also changed implementation of RVV Ceil and Floor with GCC 13 and earlier and Clang 16 and earlier to avoid changing the floating-point rounding mode using inline assembly, which fixes bugs with RVV Ceil/Floor with Clang 16 and later and fixes test failures on RVV with Clang 16 and later.

Also updated F64->F32 and F32->F16 DemoteTo on RVV to use the __riscv_vfncvt_f_f_w_f intrinsics instead of the riscv_vfncvt_rod_f_f_w_f intrinsics as the riscv_vfncvt_rod_f_f_w_f intrinsics round using round-to-odd mode instead of round-to-nearest mode.

Also updated F64->F16 DemoteTo on RVV to first convert from F64 to F32 using detail::DemoteToF32WithRoundToOdd (which converts F64 values to F32 values using round-to-odd rounding), and then demote from F32 to F16.