guillaumeblanc / ozz-animation

Open source c++ skeletal animation library and toolset
http://guillaumeblanc.github.io/ozz-animation/
Other
2.46k stars 302 forks source link

AVX optimization opportunities #135

Open sherief opened 2 years ago

sherief commented 2 years ago

I'm compiling Ozz Animation for AVX and I noticed in simd_math_sse-inl.h that with only OZZ_SHUFFLE_PS1() is specialized for AVX. I thought there might be opportunities to implement > SSE2 intrinsics for things like hadd etc, but I was wondering whether there's a reason they weren't already used or was it just a lack of time to implement them? I don't mind implementing them and submitting a PR, but if you've looked into it already and decided for one reason or another that they don't help with perf or don't fit in then I can just save the time / effort.

guillaumeblanc commented 2 years ago

Hi,

If I remember correctly, I looked at hadd but eventually didn't use it (apparently). I think there was no clear benefit from using hadd for the use case I studied (https://github.com/guillaumeblanc/ozz-animation/blob/master/include/ozz/base/maths/internal/simd_math_sse-inl.h#L65), because hadd only adds 2 components: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=hadd&ig_expand=3846.

Looking at that thread https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-sse-vector-sum-or-other-reduction, there might be something better to do with shuffles.

If you find something better to do with hadd, or any other SSE2+ intrinsic, a PR will be very welcome for sure !

Note though that dot product intrinsic is used when available.

Cheers, Guillaume

guillaumeblanc commented 8 months ago

Hi, any news on that front ?