RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Is your feature request related to a problem? Please describe.
During IVF-Flat search a query vector is compared to all the vectors from n_probes clusters, and we have n_queries * n_probes query-probe pairs. For large batch search, when n_queries * n_probes > n_clusters then there will be clusters that are compared to more than one query vector.
The execution time of IVF-Flat is determined by the time to load the clusters from memory. Currently the query-probe pairs are sorted according to query index. To improve memory load time, we can sort the query-probe pairs according to the probe id (cluster label).
Is your feature request related to a problem? Please describe. During IVF-Flat search a query vector is compared to all the vectors from
n_probes
clusters, and we haven_queries * n_probes
query-probe pairs. For large batch search, whenn_queries * n_probes > n_clusters
then there will be clusters that are compared to more than one query vector.The execution time of IVF-Flat is determined by the time to load the clusters from memory. Currently the query-probe pairs are sorted according to query index. To improve memory load time, we can sort the query-probe pairs according to the probe id (cluster label).
Describe the solution you'd like
Sort the query-probe pairs during fine search for better cache reuse. This is already implemented for IVF-PQ, and the same can be applied for IVF-Flat as well: https://github.com/rapidsai/raft/blob/734298013b02b43d57275b342ab39d1dfd102543/cpp/include/raft/neighbors/detail/ivf_pq_search.cuh#L529-L569
Additional context In IVF-Flat search we typically have 0.1-1% of the clusters searched, therefore this optimization is expected to help with batch size that is correspondingly large (hundreds or thousends of query vectors). We have a helper utility to calculate the expected number of times a cluster is loaded. This can be used to decide whether to sort the input data or not. https://github.com/rapidsai/raft/blob/734298013b02b43d57275b342ab39d1dfd102543/cpp/include/raft/neighbors/detail/ivf_pq_search.cuh#L396