l3lackcurtains / fast-cuda-gpu-dbscan

:star2: CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA. Based on the research paper.
9 stars 3 forks source link

How to use fast-cuda-gpu-dbscan #1

Open tmd78 opened 1 year ago

tmd78 commented 1 year ago

Hello. I'm conducting research under Dr. Gowanlock and I'm working with fast-cuda-gpu-dbscan. I'm having trouble getting the application to find clusters.

My scenario:

Are there any requirements I'm missing? Any help is greatly appreciated.

Note: I've confirmed the expected number of clusters using sklearn's dbscan.

common.h

#ifndef PARAMS_H
#define PARAMS_H

using namespace std;

#define RANGE 2
#define UNPROCESSED -1
#define NOISE -2

#define DIMENSION 2
#define TREE_LEVELS (DIMENSION + 1)

#define THREAD_BLOCKS 256
#define THREAD_COUNT 256

#define MAX_SEEDS 128
#define EXTRA_COLLISION_SIZE 512

#define DATASET_COUNT 10000

#define MINPTS 4
#define EPS 1.5

#define PARTITION_SIZE 70
#define POINTS_SEARCHED 9

...

#endif

output.txt

l3lackcurtains commented 1 year ago

Did you get it working? Im late seeing it.

tmd78 commented 1 year ago

Hi! I haven't gotten it to work yet. I'm inspecting what happens at each step of the process to see if I can find the problem.

I'm working with a small dataset:

1,1
2,2
3,3
10,10
11,11
12,12
20,20
21,21
22,22

The algorithm creates an index with 13 bins. Here is the information each bin holds:

bin 1 level: 0 upperBounds: 8.000000
bin 2 level: 0 upperBounds: 15.000000
bin 3 level: 0 upperBounds: 29.000000
bin 4 level: 1 upperBounds: 8.000000
bin 5 level: 1 upperBounds: 15.000000
bin 6 level: 1 upperBounds: 29.000000
bin 7 level: 1 upperBounds: 8.000000
bin 8 level: 1 upperBounds: 15.000000
bin 9 level: 1 upperBounds: 29.000000
bin 10 level: 1 upperBounds: 8.000000
bin 11 level: 1 upperBounds: 15.000000
bin 12 level: 1 upperBounds: 29.000000

The algorithm assigns the points to bins like this:

dataKey: 4 dataValue: 0
dataKey: 4 dataValue: 1
dataKey: 4 dataValue: 2
dataKey: 8 dataValue: 3
dataKey: 8 dataValue: 4
dataKey: 8 dataValue: 5
dataKey: 12 dataValue: 6
dataKey: 12 dataValue: 7
dataKey: 16 dataValue: 8

Does this look correct?

tmd78 commented 1 year ago

Oh, here's the common.h I got the above results with common.h.txt

tmd78 commented 1 year ago

Would you be able to provide me your email address?

l3lackcurtains commented 1 year ago

I quickly checked it. It seems the issue is in the indexing. There hasn't been any distance calculations.

dataKey should be within the range of 4-13. I didn't test the extreme condition (max,max). Need to adjust it.

Also, my email is in research paper.

tmd78 commented 1 year ago

I was able to get the points assigned to bins correctly. The next problem I see is the bins' dataBegin and dataEnd properties all being set to zero. Also, the dataKey values are never used by DBSCAN, so we never use the mapping of points to bins?

l3lackcurtains commented 1 year ago

I see. I fixed the problem. It was some memory issue.

dataKey is not used in DBSCAN, because the data are mapped in the range of indexBuckets, dataBegin & dataEnd. dataBegin and dataEnd are indexes of dataset.

tmd78 commented 1 year ago

Hi Madhav. I'm getting this cluster output for the data I provided above:

-2
7
13
7
7
7
13
13
13

These are my common.h settings:

#define RANGE 2
#define UNPROCESSED -1
#define NOISE -2

#define DIMENSION 2
#define TREE_LEVELS (DIMENSION + 1)

#define THREAD_BLOCKS 3
#define THREAD_COUNT 9

#define MAX_SEEDS 128
#define EXTRA_COLLISION_SIZE 512

#define DATASET_COUNT 9

#define MINPTS 3
#define EPS 1.5

#define PARTITION_SIZE 3
#define POINTS_SEARCHED 3

The cluster numbers don't make any sense. I'm trying to figure out how to get correct results. Do you think the issue is my common.h settings or that there are more bugs that need to be addressed?

Note: I'm running this on CUDA 11.2.

l3lackcurtains commented 1 year ago

Common file seems good except that POINTS_SEARCHED should be 9 for 2D and 27 for 3D. As it is number of cells around a cell to do the range search.

The experiment is executed on CUDA 11.3.

The thrust library equal_range function seems to give 0 even when the parameters are correct. It might be because of the thrust upgrade. I'm not sure which thrust version i specifically used. Now a days, It seems to come with CUDA installation. I installed it manually.