haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

how to use spark dataframe in DBSCAN for spark and scala? #718

Closed simonshiwt closed 2 years ago

simonshiwt commented 2 years ago

i find that DBSCAN can handle the data type of Array[Array[Double]], but that can not support billions of data, so i what to know how to use dataframe(or RDD) when i use DBSCAN in spark and scala, so that i can handle data parallel . thank you!

haifengl commented 2 years ago

The speed is determined by nearest neighbor search. Dataframe won't make it faster.

simonshiwt commented 2 years ago

get it .thank you for your contribution!