l3lackcurtains / fast-cuda-gpu-dbscan

:star2: CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA. Based on the research paper.
9 stars 3 forks source link

🌟 CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA

CUDA-DClust+ is a fast DBSCAN algorithm that leverages many of the algorithm designs in CUDA-DClust and parallels DBSCAN algorithms in the literature.

The algorithm takes as input the dataset D, ϵ, and minpts, and outputs a list of points and their corresponding cluster or whether it has been assigned a noise label. The algorithm constructs an index on the dataset (D) based on the number of partitions (r) at each level of the tree. Then, it performs the cluster expansion routine on the GPU and merges the clusters on the CPU.

Research Paper

Key contributions

Outcome

DBSCAN comparison paper by Mustafa et al. showed that GDBSCAN outperforms CUDA-DClust with speedup up to 18x. Using our optimizations, we show that CUDA-DClust+ achieves up to ~23x speedup compared with GDBSCAN, demonstrating that the original CUDA-DClust design works well when optimized for newer GPU architectures.

Future Works

Future work includes investigating the dense box algorithm that can be used to further reduce the number of distance calculations in high-density regions.

Organization

.
├── cpu-dbscan              # CPU DBSCAN algorithm
├── multicore-cpu           # Parallel Multi-threaded CPU DBSCAN algorithm
├── dbscan-cpu-indexing     # CUDA DBSCAN algorithm with CPU indexing
├── gdbscan                 # G-DBSCAN algorithm
├── cuda-dclust             # CUDA-DClust algorithm (clone)
├── comparision-plots       # Comparision plots with Jupyter Notebook code
|── test-results            # Scripts to test the cluster results
|── test-datasets           # 2D and 3D datasets for test
|── main.cu                 # Main CUDA-DClust+ algorithm
|── exp.cu                  # CUDA-DClust+ algorithm used in an experiment
|── Makefile                # Makefile for CUDA-DClust+ algorithm
└── paper.pdf               # Research paper

Datasets

Datasets used in the experiment are available online. https://rcdata.nau.edu/gowanlock_lab/datasets/CUDA_DCLUST_datasets/CUDA_DCLUST_HiPC2021.zip

Read 'readme.txt' inside archive file before using the datasets.

Usage

CUDA-DClust+

Change the DBSCAN configuration in 'common.h' file file and run

make && ./main.exe ./test-datasets/2d.txt

Experimental Results

--

2D experiment

2d Experiment

3D experiment

3d Experiment