A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
MIT License
4.83k
stars
580
forks
source link
random test failures on older CPUs without SSE/AVX/AVX2/AVX512 #316
Describe the bug
There are strange random test failures on older CPUs that don't have SSE/AVX/AVX2/AVX512.
To Reproduce
Steps to reproduce the behavior:
Build SPTAG on a machine without AVX/AVX2/AVX512.
Run SPTAGTests
See error
Expected behavior
The tests should work.
Analysis
I think that because of the -mavx2 -mavx -msse -msse2 -mavx512f -mavx512bw -mavx512dq options in the DistanceUtilstarget_compile_options, the compiler is generating newer instructions in the DistanceUtils library and these are not run on older CPUs. Removing the options not supported by the CPU and deleting the functions using instructions that those options enable fixes this issue. So the cause is definitely the options being enabled.
Suggestions
On Linux you can use GCC function multi-versioning to get the compiler to automatically check the CPU at runtime and dispatch to the right functions.
Screenshots
Some examples of the random failures:
[1] Start invoking BuildTrees.
[1] BKTKmeansK: 3, BKTLeafSize: 6, Samples: 100, BKTLambdaFactor:-1.000000 TreeNumber: 1, ThreadNum: 2.
unknown location(0): fatal error: in "SSDServingTest/TestHeadUInt8L2DEFAULT": memory access violation at address: 0x00000000: no mapping at fault address
./Test/src/SSDServingTest.cpp(444): last checkpoint: "TestHeadUInt8L2DEFAULT" test entry
*** 1 failure is detected in the test module "Main"
[1] Parallel TpTree Partition done
[1] Build TPTree time (s): 4
[1] Processing Tree 0 0%
unknown location(0): fatal error: in "AlgoTest/KDTTest": signal: illegal operand; address of failing instruction: 0x559a5eccc130
./Test/src/AlgoTest.cpp(22): last checkpoint
*** 1 failure is detected in the test module "Main"
Describe the bug There are strange random test failures on older CPUs that don't have SSE/AVX/AVX2/AVX512.
To Reproduce Steps to reproduce the behavior:
SPTAGTests
Expected behavior The tests should work.
Analysis I think that because of the
-mavx2 -mavx -msse -msse2 -mavx512f -mavx512bw -mavx512dq
options in theDistanceUtils
target_compile_options
, the compiler is generating newer instructions in theDistanceUtils
library and these are not run on older CPUs. Removing the options not supported by the CPU and deleting the functions using instructions that those options enable fixes this issue. So the cause is definitely the options being enabled.Suggestions On Linux you can use GCC function multi-versioning to get the compiler to automatically check the CPU at runtime and dispatch to the right functions.
Screenshots Some examples of the random failures:
Desktop: