haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

Why DBSCAN running so slow? #720

Closed simonshiwt closed 2 years ago

simonshiwt commented 2 years ago

the data used for DBSCAN is 2 dimensional like this:

-0.09168783354624013,-0.04115862153510882 -14.461813471635896,3.013673467505883 -9.719941137529991,-1.6227065043042066 0.0,0.0 ........ and the data with 122638 rows . after 40 minutes,the DBSCAN still running, I use DBSCAN like this(use scala on spark): val kdtree: KDTree[Array[Double]] = new KDTree[Array[Double]](dbscanArray, dbscanArray) val dbscanResultKdtree = DBSCAN.fit(dbscanArray, kdtree, 10, 20)

and the package is from maven repo:

dependency

com.github.haifengl
        <artifactId:>smile-scala_2.12</artifactId>
        <version:>2.6.0</version>

dependency

is there something wrong? actually i want to clustering 500 million rows data , is it workable for DBSCAN in SMILE?

haifengl commented 2 years ago

Try smaller radius

simonshiwt commented 2 years ago

ok, thank you .