Open aktau opened 10 years ago
Thanks for this suggestion and the links. I have not been working on this project in a long time, but I'll reconsider this when I get back to some SIMD 3d stuff.
I guess some kind of benchmarking could be useful, there are other functions in addition to normalization that have multiple implementations and I can't tell which ones are fast and which ones are slow.
While working a bit on my toy language and searching for SIMD tips, I encountered this article: http://webcache.googleusercontent.com/search?q=cache:cMDSJGbFY-MJ:www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/+&cd=3&hl=en&ct=clnk&gl=be
In which it is stated:
It seems to be some sort of inlined variant of the
SSE2
code ofvdot
intovunit
with higher accuracy (norsqrt
). Just leaving it here for the future, to verify. It would be interesting to compare. I'm reasonably sure thedivps
would adversely affect the performance of the article's code, but I'd be interested to find out.Of course, the very best perf could be obtained by doing multiple vectors at once and transforming to SoA form (either on the fly or not): https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx