google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.95k stars 305 forks source link

Implementing Scalbn/ScaleF ops #2266

Open johnplatts opened 4 days ago

johnplatts commented 4 days ago

I have been working on implementations of the Scalbn/ScaleF ops.

The Scalbn(V v, VFromD<RebindToUnsigned<DFromV<V>>> exp) op is equivalent to std::scalbn(v[i], exp[i]).

The ScaleF(V v, V exp) op is equivalent to Scalbn(v, ConvertTo(RebindToUnsigned<DFromV<V>>(), Floor(exp))) or AVX3 _mm_scalef_ps/_mm_scalef_pd.

I have added a FloorInt op in pull request #2265 that allows ScaleF to be more efficiently implemented on SSE2/SSSE3/AArch64 NEON.

Should the Scalbn/ScaleF ops be renamed to something else?

jan-wassenberg commented 2 days ago

Nice, it seems useful to be able to generate the scalef instructions. On the naming: I think scalbn is defined in terms of FLT_RADIX, but we only care about FLT_RADIX=2, right? If so, this is equivalent to ldexp, right? I don't love either of those names. Seems like MulByPow2 would be more clear?

johnplatts commented 2 days ago

Nice, it seems useful to be able to generate the scalef instructions. On the naming: I think scalbn is defined in terms of FLT_RADIX, but we only care about FLT_RADIX=2, right? If so, this is equivalent to ldexp, right? I don't love either of those names. Seems like MulByPow2 would be more clear?

scalbn is equivalent to ldexp for F16/F32/F64 floating point types.

I agree that MulByPow2(V v, VI exp) would be more clear for the scalbn/ldexp wrapper and MulByFloorPow2(V v, V exp) would make more sense for the wrapper around AVX3 _mm_scalef_ps/pd/ph.