scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
185 stars 42 forks source link

N clusters medoids fix #129

Closed stefanhannie closed 2 years ago

stefanhannie commented 2 years ago

I noticed a bug when using init == "random" for KMedoids. You'll at times not get the requested number of clusters. The issue is that the initial medoids are being randomly sampled with replacement. This sets the flag to not sample with replacement.

I've also added a test to ensure n_clusters is correct for all initializations / method combinations.


Here's a repro if you're curious:

kmedoids = sk.cluster.KMedoids(10, init="random", random_state=123)
kmedoids.fit(np.array([(x, x) for x in range(15)]))
assert len(np.unique(out.medoid_indices_)) == 10
stefanhannie commented 2 years ago

Looks good to me, thanks for the PR. The checks should be fixed after merging with PR #131

You bet. Thanks for the review! I've rebased, so things should be good to merge once the tests finish.