motiwari / BanditPAM

BanditPAM C++ implementation and Python package
MIT License
647 stars 38 forks source link

Bug Report: Slower than k-means on `n=10,000` moon dataset #244

Open motiwari opened 1 year ago

motiwari commented 1 year ago

Original comment: https://news.ycombinator.com/reply?id=35464068&goto=item%3Fid%3D35445312%2335464068

Hi Mo, thanks for this work. It seems interesting. I had the chance to play a little bit and wanted to compare that with KMeans. I relied on sklearn KMeans implementation.

Furthermore, I did some examples (mostly what is available). But One interesting thing I did is I generated some isotropic Gaussian blobs for clustering (using make_blobs) and then tried a comparison between the two methods. Bandit PAM was a little bit better for a couple of metrics I used, but also much faster. I was generating n_samples=1000 but then I increased it to n_samples=10000 and I found that it is much slower than KMeans, see [1] and code is in [2]. Is there a particular reason for that?

[1] https://imgur.com/a/VibpgNz

[2] https://paste.elashri.xyz/aXCE