A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
In some situation, shuffle will not make sure that indice[pos[k] + counts[k] -1] == clusterIdx[k]:
assume there are 100 vectors and we cluster them into 2 clusters with size 70 and 30, and it just so happens that clusters containing 30 vectors are all in indices[0~69], shuffle functions will not swap clusterIdx of the second cluster since newCounts[1] is being zero in the first loop.
In some situation, shuffle will not make sure that indice[pos[k] + counts[k] -1] == clusterIdx[k]: assume there are 100 vectors and we cluster them into 2 clusters with size 70 and 30, and it just so happens that clusters containing 30 vectors are all in indices[0~69], shuffle functions will not swap clusterIdx of the second cluster since newCounts[1] is being zero in the first loop.