RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Introduce a global pool of host/device buffers for result neighbors and distances to keep the allocations across benchmark cases.
This slightly reduces the overheads of the benchmark setup, but, most importantly, reduces the number of cuda driver calls that may block the context. The latter is relevant for benchmarking persistent kernels - to let them run across multiple benchmark cases.
Introduce a global pool of host/device buffers for result neighbors and distances to keep the allocations across benchmark cases. This slightly reduces the overheads of the benchmark setup, but, most importantly, reduces the number of cuda driver calls that may block the context. The latter is relevant for benchmarking persistent kernels - to let them run across multiple benchmark cases.