microsoft / SPTAG

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
MIT License
4.83k stars 580 forks source link

fix bkt bug: shuffle bug when clustering #348

Closed Yuming-Xu closed 2 years ago

Yuming-Xu commented 2 years ago

In some situation, shuffle will not make sure that indice[pos[k] + counts[k] -1] == clusterIdx[k]: assume there are 100 vectors and we cluster them into 2 clusters with size 70 and 30, and it just so happens that clusters containing 30 vectors are all in indices[0~69], shuffle functions will not swap clusterIdx of the second cluster since newCounts[1] is being zero in the first loop.