rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU
https://rapids.ai
Apache License 2.0
148 stars 55 forks source link

[FEA] Batched indexes #189

Open yongchanghao opened 2 months ago

yongchanghao commented 2 months ago

I have a use case where there are multiple indexes on one device. The queries are also batched. For example, if there are N indexes, the query matrix has shape (N, Q, D). The expected return reshape is (N, Q, K).

The brute-force algorithm is pretty easy with numpy/cupy/torch. But is there a plan to implement this for IVF or more algorithms? Also, is there a guide for parallelizing this process using CPU threads?

cjnolet commented 1 month ago

@yongchanghao your use-case is interesting. We have not considered this exact case, but I wonder if multi-threading the search (and using different streams for each thread) would allow you to do this and improve performance over querying the indecxes individually.

We don't have a guide specifically for parallelizing with thread but we do have this getting started guide that provides a starting point for navigating the various CUDA APIs that we use in cuVS. Assuming you are going to use multiple threads here, it'll be important that you use a unique instance of the underlying raft::device_resources for each thread.

By default, I believe we also enable default_stream_per_thread so the calls to the different indexes should naturally overlap on the GPU (cc @divyegala to correct me if I'm wrong here).

The final consideration you'll want to make is that the allocations/deallocations of temporary memory buffers within the algorithms can cause the whole GPU device to synchronize to the host threads each time. To get around this issue, we use memory pools to allocate a huge chunk of memory up front. We use RMM for the memory pools (also mentioned in the getting started guide) and the memory pool should also be thread-safe.