josephius / star-clustering

A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.
Apache License 2.0
213 stars 21 forks source link

What is the paper attached to this algorithm #3

Open egy1st opened 4 years ago

egy1st commented 4 years ago

What is the paper attached to this algorithm

wyvern92 commented 4 years ago

I am also interested in the reference as well

shy1 commented 4 years ago

as far as I know there has been no official paper published and i believe that currently the best reference for and discussion of the algorithm can be found at: https://www.reddit.com/r/MachineLearning/comments/gsu3zm/p_star_clustering_a_clustering_algorithm_that/

josephius commented 4 years ago

Correct, this particular algorithm does not have a paper yet, though there may be similar algorithms in the literature.

I'm currently looking into the idea of writing up a paper, possibly with the help of potential co-authors from the Machine Learning Reddit thread who can help with the theoretical justification. Seth's contributions already look excellent enough to include him as a co-author as well, if he wants to be a part of the the paper.

AnotherSamWilson commented 4 years ago

Can you describe the general idea of the algorithm here?

RahulBhalley commented 4 years ago

@josephius have you discovered this technique? If not, how you or anybody else except the inventor can decide writing a paper describing it?

josephius commented 4 years ago

@AnotherSamWilson The general idea of the algorithm is to first connect the closest points until their mean mass exceeds a limit determined by the mean distance multipled by a constant of proportionality, taking the resulting graphs as clusters and then disconnecting points whose mass falls below a certain threshold (or in one variant also above a cutoff), and then taking the resulting disconnected points and connecting them to the nearest remaing cluster.

@rahulbhalley I did discover this technique and so would be considered the inventor, though @shy1 has made some interesting modifications to it as well. I made a decision to publish this repo before publishing a paper because I was initially doubtful that the technique had a strong enough theoretical justification to publish, however many of the people I presented it to on the Machine Learning Reddit seem to think otherwise, so I am looking at how to go about publishing it properly.

RahulBhalley commented 4 years ago

That's nice.