Open sherief opened 2 years ago
Hi,
If I remember correctly, I looked at hadd but eventually didn't use it (apparently). I think there was no clear benefit from using hadd for the use case I studied (https://github.com/guillaumeblanc/ozz-animation/blob/master/include/ozz/base/maths/internal/simd_math_sse-inl.h#L65), because hadd only adds 2 components: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=hadd&ig_expand=3846.
Looking at that thread https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-sse-vector-sum-or-other-reduction, there might be something better to do with shuffles.
If you find something better to do with hadd, or any other SSE2+ intrinsic, a PR will be very welcome for sure !
Note though that dot product intrinsic is used when available.
Cheers, Guillaume
Hi, any news on that front ?
I'm compiling Ozz Animation for AVX and I noticed in simd_math_sse-inl.h that with only OZZ_SHUFFLE_PS1() is specialized for AVX. I thought there might be opportunities to implement > SSE2 intrinsics for things like hadd etc, but I was wondering whether there's a reason they weren't already used or was it just a lack of time to implement them? I don't mind implementing them and submitting a PR, but if you've looked into it already and decided for one reason or another that they don't help with perf or don't fit in then I can just save the time / effort.