I have added full support for SSE2, AVX2, and AVX512 for performance improvement when using Distance Type Hamming.
The code was expanded to the required number of loops per instruction set using macro to UNROLL with bit specification.
The code has been designed to support dimensional data in multiples of 16.
AVX2 uses SSE2 for fraction processing, and AVX512 uses AVX2 and SSE2 for fraction processing.
No fractional processing is performed in SSE2 because it is assumed that no fractional numbers will be produced.
I separately wrote a following test code to test the comparison between the function implemented this time and existing functions and general Hamming functions.
The benchmark results for AVX2 and SSE2 are Error 0, so there should be no major problems with calculation accuracy and speed as shown in the following result.
However, I have not been able to test AVX512 because I do not have a test environment at this time.
I have added full support for SSE2, AVX2, and AVX512 for performance improvement when using Distance Type Hamming. The code was expanded to the required number of loops per instruction set using macro to UNROLL with bit specification. The code has been designed to support dimensional data in multiples of 16.
AVX2 uses SSE2 for fraction processing, and AVX512 uses AVX2 and SSE2 for fraction processing. No fractional processing is performed in SSE2 because it is assumed that no fractional numbers will be produced.
I separately wrote a following test code to test the comparison between the function implemented this time and existing functions and general Hamming functions. The benchmark results for AVX2 and SSE2 are Error 0, so there should be no major problems with calculation accuracy and speed as shown in the following result.
However, I have not been able to test AVX512 because I do not have a test environment at this time.
Test Code is below:
Test Result: