Closed aurora327 closed 11 months ago
Hi @yurymalko, Can you please to review the code?
Hi @aurora327,
Thank you for the PR! I am slow to respond currently due to sickness, sorry.
I wonder, how much improvement did you see in the tests with the better implementation?
hi, @yurymalkov Different Size of vector1 and vector2 of the passed parameters have different performance gains, my own dataset build on 4th Generation Intel® Xeon® resulted in a 2% to 10% end-to-end improvement with 4 cores bound. I hope you feel better soon :)
Thanks again for the PR! I've also checked the query performance, it is up to 15% for 16-dim. I also wonder if aligned/unaligned memory makes a difference for the current architectures?
No differences were observed since the avx512 instruction used handles both aligned and unaligned data well. At the same time, the aligned buffer is highly recommended if possible.
InnerProductSIMD16ExtAVX512 functions are implemented using the more efficient AVX512 instruction set