RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Introduce a new virtual member uses_stream() for the AnnGPU class. Overriding this allows an algorithm inform the benchmark whether the stream synchronization is needed between benchmark iterations.
This is relevant for a potential persistent kernel where the CPU threads use an independent mechanics to synchronize and get the results from the GPU.
This is different from just not implementing AnnGPU for an algorithm in that it allows the algorithm to decide whether the synchronization is needed (depending on input parameters at runtime), while still providing the get_sync_stream() functionality.
Introduce a new virtual member
uses_stream()
for theAnnGPU
class. Overriding this allows an algorithm inform the benchmark whether the stream synchronization is needed between benchmark iterations.This is relevant for a potential persistent kernel where the CPU threads use an independent mechanics to synchronize and get the results from the GPU. This is different from just not implementing
AnnGPU
for an algorithm in that it allows the algorithm to decide whether the synchronization is needed (depending on input parameters at runtime), while still providing theget_sync_stream()
functionality.