RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
// Initialize RAFT resources and stream pool
raft::resources res;
raft::resource::sync_stream(res);
std::shared_ptr stream_pool = std::make_shared(min_thread);
raft::device_resources resources(rmm::cuda_stream_per_thread, stream_pool);
//Creating graph in the gpu using cugraph
pragma omp parallel num_threads(min_thread)
{
pragma omp for
for (int i = 0; i < v_edgeWeight.size(); i++)
{
int threadNo = omp_get_thread_num();
auto stream_view = resources.get_stream_from_stream_pool(threadNo);
raft::handle_t local_handle(stream_view);
// Passing local_handle and stream_view to a function that launches cugraph::BFS
}
}
Will this setup ensure that each BFS operation is parallely executed on the GPU ? Any potential pitfalls or performance issues I should be aware of with this approach?
I appreciate any insights or suggestions you might have. Thank you!
// Initialize RAFT resources and stream pool raft::resources res; raft::resource::sync_stream(res); std::shared_ptr stream_pool = std::make_shared(min_thread);
raft::device_resources resources(rmm::cuda_stream_per_thread, stream_pool);
//Creating graph in the gpu using cugraph
pragma omp parallel num_threads(min_thread)
{
pragma omp for
}
Will this setup ensure that each BFS operation is parallely executed on the GPU ? Any potential pitfalls or performance issues I should be aware of with this approach?
I appreciate any insights or suggestions you might have. Thank you!