Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

https://github.com/nmslib/hnswlib

Apache License 2.0

4.12k stars 609 forks source link

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

Closed aurora327 closed 11 months ago

aurora327 commented 1 year ago

InnerProductSIMD16ExtAVX512 functions are implemented using the more efficient AVX512 instruction set

aurora327 commented 1 year ago

Hi @yurymalko, Can you please to review the code?

yurymalkov commented 1 year ago

Hi @aurora327,

Thank you for the PR! I am slow to respond currently due to sickness, sorry.

I wonder, how much improvement did you see in the tests with the better implementation?

aurora327 commented 1 year ago

hi, @yurymalkov Different Size of vector1 and vector2 of the passed parameters have different performance gains, my own dataset build on 4th Generation Intel® Xeon® resulted in a 2% to 10% end-to-end improvement with 4 cores bound. I hope you feel better soon :)

yurymalkov commented 11 months ago

Thanks again for the PR! I've also checked the query performance, it is up to 15% for 16-dim. I also wonder if aligned/unaligned memory makes a difference for the current architectures?

aurora327 commented 11 months ago

No differences were observed since the avx512 instruction used handles both aligned and unaligned data well. At the same time, the aligned buffer is highly recommended if possible.