the data used for DBSCAN is 2 dimensional like this:
-0.09168783354624013,-0.04115862153510882
-14.461813471635896,3.013673467505883
-9.719941137529991,-1.6227065043042066
0.0,0.0
........
and the data with 122638 rows . after 40 minutes,the DBSCAN still running,
I use DBSCAN like this(use scala on spark):
val kdtree: KDTree[Array[Double]] = new KDTree[Array[Double]](dbscanArray, dbscanArray)
val dbscanResultKdtree = DBSCAN.fit(dbscanArray, kdtree, 10, 20)
the data used for DBSCAN is 2 dimensional like this:
-0.09168783354624013,-0.04115862153510882 -14.461813471635896,3.013673467505883 -9.719941137529991,-1.6227065043042066 0.0,0.0 ........ and the data with 122638 rows . after 40 minutes,the DBSCAN still running, I use DBSCAN like this(use scala on spark): val kdtree: KDTree[Array[Double]] = new KDTree[Array[Double]](dbscanArray, dbscanArray) val dbscanResultKdtree = DBSCAN.fit(dbscanArray, kdtree, 10, 20)
and the package is from maven repo:
dependency
dependency
is there something wrong? actually i want to clustering 500 million rows data , is it workable for DBSCAN in SMILE?