haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

DBSCAN result has been changed when I use the API predict #719

Closed simonshiwt closed 2 years ago

simonshiwt commented 2 years ago

when i use dbscan like this:

val dbscanResult = dbscan(data1, 10, 15) the result is : Cluster size of 1000 data points: Cluster 1 36 ( 3.6%) Cluster 2 29 ( 2.9%) Cluster 3 36 ( 3.6%) Cluster 4 12 ( 1.2%) Cluster 5 20 ( 2.0%) Cluster 6 14 ( 1.4%) Cluster 7 22 ( 2.2%) Cluster 8 27 ( 2.7%) Cluster 9 21 ( 2.1%) Cluster 10 24 ( 2.4%) Cluster 11 14 ( 1.4%) Cluster 12 12 ( 1.2%) Cluster 13 17 ( 1.7%) Cluster 14 16 ( 1.6%) Cluster 15 11 ( 1.1%) Cluster 16 14 ( 1.4%) Cluster 17 22 ( 2.2%) Cluster 18 11 ( 1.1%) Cluster 19 3 ( 0.3%) Cluster 20 11 ( 1.1%) Outliers 628 (62.8%)

later when i use the method dbscanResult.predict to predict the same data "data1" ,like this: data1.foreach(x=>{ println(dbscanResult.predict(x)) })

the result is different : ClusterId count 0 13 1 9 2 14 3 2 4 5 5 5 6 4 7 6 8 9 9 9 10 3 11 4 12 4 13 3 14 1 15 3 16 7 17 1 19 1 2,147,483,647 897

why this happen? is it right? in my comprehension the two result should be the same?

haifengl commented 2 years ago

The predict() method doesn't follow the exact logic. I will look into it. Thanks.

simonshiwt commented 2 years ago

The predict() method doesn't follow the exact logic. I will look into it. Thanks.

Thank you for your efforts.

haifengl commented 2 years ago

The fix is in the master branch. please have a try. It may not produce exactly same labels but should be very close.

simonshiwt commented 2 years ago

thanks a lot.