google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.95k stars 305 forks source link

Added DemoteToNearestInt and F16/F64 NearestInt ops #2223

Closed johnplatts closed 1 month ago

johnplatts commented 1 month ago

Added F64 to I32 DemoteToNearestInt op as SSE2/SSSE3/SSE4/AVX2/AVX3/RVV have instructions for round-to-nearest F64 to I32 conversions.

Also added NearestInt op for F64 vectors if HWY_HAVE_FLOAT64 is 1 as AVX3/AArch64 NEON have instructions for round-to-nearest F64 to I64 conversions and as x86_64 can do a scalar round-to-nearest conversion from F64 to I64 using the cvtsd2si instruction.

Also added NearestInt op for F16 vectors if HWY_HAVE_FLOAT16 is 1 as NEON_BF16/AVX3_SPR have instructions for round-to-nearest F16 to I16 conversions.

There are also use cases for F64 to I32 DemoteToNearestInt in the implementation of Exp2 in hwy/contrib/math/math-inl.h.