rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.25k stars 534 forks source link

[BUG] cuml.cluster.HDBSCAN.fit_predict (GPU accelerated) is slower than hdbscan.HDBSCAN.fit_predict (CPU only)! #6117

Closed sava-1729 closed 1 month ago

sava-1729 commented 1 month ago

Describe the bug hdbscan library's fit_predict produces faster output. How to get GPU acceleration?

Steps/Code to reproduce bug Try running this code (Requirements: cupy, numba, hdbscan)

from time import perf_counter_ns

import cupy as cp
import numba as nb
import numpy as np
from cuml.cluster import HDBSCAN as HDBSCAN_GPU
from hdbscan import HDBSCAN as HDBSCAN_CPU

class Test:
    def __init__(self) -> None:
        self.model = HDBSCAN_CPU(min_samples=10, min_cluster_size=10)
        self.model_cuml = HDBSCAN_GPU(min_samples=20, min_cluster_size=10)
        self.total = 0
        self.total_cuml = 0
        self.counter = 0

    def test(self, num_points=4000, use_cupy=True, use_xy_only=False):
        arr = np.random.random((num_points, 3)) * 100
        if use_xy_only:
            arr = arr[:, :2]
        if use_cupy:
            arr = cp.asarray(arr)
            arr = nb.cuda.to_device(arr)
        t0 = perf_counter_ns()
        y_hat = self.model.fit_predict(arr)
        elapsed = (perf_counter_ns() - t0) // 1000000
        self.total += elapsed
        print("------------------------------ CPU %d ms -----------------------------" % elapsed, flush=True)
        t0 = perf_counter_ns()
        y_hat = self.model_cuml.fit_predict(arr)
        elapsed = (perf_counter_ns() - t0) // 1000000
        self.total_cuml += elapsed
        print("------------------------------ GPU %d ms -----------------------------" % elapsed, flush=True)
        self.counter += 1

tester = Test()

for i in range(100):
    tester.test()

print("Average time %f ms over %d iterations on CPU." % (tester.total / tester.counter, tester.counter), flush=True)
print("Average time %f ms over %d iterations  on GPU." % (tester.total_cuml / tester.counter, tester.counter), flush=True)

With the current default pointcloud size, I get the following output:

Average time 83.730000 ms over 100 iterations on CPU.
Average time 127.030000 ms over 100 iterations on GPU.

Expected behavior I would expect the cuML's GPU accelerated clustering to be much faster than the normal CPU based one.

Environment details (please complete the following information):

divyegala commented 1 month ago

@sava-1729 the dataset sample size is too small and the timings too minimal (few 100 milliseconds) to see any significant speedups. cuML HDBSCAN, and other cuML algorithms in general, will start showing speedups as the dataset size increases to real-life sized datasets. You can try 40,000 or 400,000 or 4,000,000 samples and let us know if you do not see any speedups. For now, I will close the issue.