scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
185 stars 42 forks source link

Random Initialization replacing values #154

Closed SebastiaanJamaer closed 1 year ago

SebastiaanJamaer commented 1 year ago

Dear,

Thank you for your nice implementation of the KMediods algorithm. I think I might have discovered a small issue in the code, however I very well could be wrong:

In the random initializaion of the medoids of the KMediods class, the following line is used:

medoids = random_state_.choice(len(D), n_clusters)

without the flag of replacing values set to False. This makes that sometimes the same indices are chosen multiple times and the KMedoids fit starts with an empty cluster. If lucky and enough iterations, the cluster does get filled during the fit, however, this is not guaranteed. So to me it would seem that having the flag replace=False seems necessary.

Of course, it could be that this is a conscious choice and I just miss the point why. In this case I apologize for implying a bug.

With kind regards

TimotheeMathieu commented 1 year ago

Yes, thank you for the bug report and no worries, bug reports are also how open-source project grow.

This bug has been fixed in PR #129 of the main branch but this is not in the released version of scikit-learn-extra yet. For now, you can just install scikit-learn from the git repo instead of pypi:

pip install git+https://github.com/scikit-learn-contrib/scikit-learn-extra

I will try to do a new release soon to have this included in the released version.

SebastiaanJamaer commented 1 year ago

Thank you for your response. I will install through the git repo instead then.