[FEA] Batched indexes - Githubissues

@yongchanghao your use-case is interesting. We have not considered this exact case, but I wonder if multi-threading the search (and using different streams for each thread) would allow you to do this and improve performance over querying the indecxes individually.

We don't have a guide specifically for parallelizing with thread but we do have this getting started guide that provides a starting point for navigating the various CUDA APIs that we use in cuVS. Assuming you are going to use multiple threads here, it'll be important that you use a unique instance of the underlying raft::device_resources for each thread.

By default, I believe we also enable default_stream_per_thread so the calls to the different indexes should naturally overlap on the GPU (cc @divyegala to correct me if I'm wrong here).

The final consideration you'll want to make is that the allocations/deallocations of temporary memory buffers within the algorithms can cause the whole GPU device to synchronize to the host threads each time. To get around this issue, we use memory pools to allocate a huge chunk of memory up front. We use RMM for the memory pools (also mentioned in the getting started guide) and the memory pool should also be thread-safe.

rapidsai / cuvs

[FEA] Batched indexes #189