scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
188 stars 43 forks source link

Add example where KMedoids is better than existing scikit-learn clustering algorithms #22

Open znd4 opened 5 years ago

znd4 commented 5 years ago

From @rth in #12:

A few more comments @zdog234 , otherwise (after a light review) LGTM.

We adopted black for code style recently. Please run black sklearn_extra/ examples/ for fixing the linter CI.

I would rather we merged this and opened follow up issues than keep this PR open until everything is perfect.

Maybe @jeremiedbb who worked on KMeans lately would also have some comments.

Later it would be nice to add an example on some dataset where KMedoids is a better than existing scikit-learn clustering algorithms as discussed in scikit-learn/scikit-learn#11099 (comment)

kno10 commented 4 years ago

The current code implements an inferior algorithm, so I'd rather suggest to compare the results of non-Python implementations (R, ELKI, pip install kmedoids) for now if you want to study result quality.

TimotheeMathieu commented 4 years ago

kmedoid can be better than kmeans for example for robust purposes. For example, see this figure where kmedoid gives a really good result while kmeans detect any outlier as belonging to a class of its own (the data consist in 3 gaussian blobs and an "outlier" group situated far away from these blobs, and I don't know a lot of clustering algorithm that would exhibit this kind of robustness (in fact kmedoid is a little more stable on this example than the algorithm I did specifically to be robust, the second figure). This example could be added to the doc I think.

rth commented 4 years ago

This example could be added to the doc I think.

That would be great! Do you already have the code for that example @TimotheeMathieu ?

TimotheeMathieu commented 4 years ago

Yes, in fact it is an example I came up for the PR #42, you can find it here, I just added k-medoid with default parameters and I got the result displayed. Maybe it would be interesting to change the doc page I made to include k-medoid because in fact k-medoid is robust. I will try making a PR for this if it's alright for you.

rth commented 4 years ago

That would be great thank you !

rth commented 4 years ago

In general if you see other things to improve in this repo don't hesitate to submit PRs, we are actively looking for maintainers :)