Closed simonshiwt closed 2 years ago
i find that DBSCAN can handle the data type of Array[Array[Double]], but that can not support billions of data, so i what to know how to use dataframe(or RDD) when i use DBSCAN in spark and scala, so that i can handle data parallel . thank you!
The speed is determined by nearest neighbor search. Dataframe won't make it faster.
get it .thank you for your contribution!
i find that DBSCAN can handle the data type of Array[Array[Double]], but that can not support billions of data, so i what to know how to use dataframe(or RDD) when i use DBSCAN in spark and scala, so that i can handle data parallel . thank you!