Use fma and fms instruction when available to speedup complex multiply

xtensor-stack / xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

https://xsimd.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

2.15k stars 253 forks source link

Closed serge-sans-paille closed 6 months ago

serge-sans-paille commented 6 months ago

This leverage the specific layout of xsimd batch of complex.