irvingc / dbscan-on-spark

An implementation of DBSCAN runing on top of Apache Spark
Apache License 2.0
183 stars 58 forks source link

Only two dimensional #3

Open bnoreus opened 8 years ago

bnoreus commented 8 years ago

It looks as if this code is built only for two dimensions. What's up with that? Please add some kind of note to make it obvious.

HappyShadowWalker commented 7 years ago

does this algorithm implementation support high dimensional data?

dodgy99 commented 7 years ago

looking at the code it is still unclear whether the algorithm supports more than 2 dimensions. Please can someone clarify?

irvingc commented 7 years ago

Hi, sorry for the delay. The code only supports two dimensions. The partitioning code and the distance function, both assume that there are only two dimensions to the data.

dodgy99 commented 7 years ago

Has smietana's commit above made the distance calculation multi-dimensional? If so is it possible to alter the partitioning code?

irvingc commented 7 years ago

From a quick look at the code it seems that way. I think it should be possible to alter the partitioning code as well, but it is a little bit more involved. I do not know of a generic way that would work for n dimensions, but I think that if you have a specific use case, it may be easier to come up with a custom partitioner for your needs. My memory is fuzzy, but I think one of the challenges was that the data became too sparse in higher dimensions. If you have more details, I could provide some ideas on how to solve it.

Nirvana-2021 commented 2 years ago

Hi, could you give any suggestions to process multi-dimensional vectors and Where I can adjuest the code?