For KMedoids medois_indices_ works fine, namely every medoid is assigned to separate cluster, but if I use CLARA, I've got all medoids assigned to just one cluster. In example below, see z (every medoid is assigned to cluster 1, (3,7,8,42,43 - those indicesmedoids I get).
To reproduce issue, I use following input saved as csv :
Thanks for reporting this, yes this is a bug.
CLARA use sub-sampling and the medoid_indices returned are the indices in the sub-sample and not in the whole dataset. I will make a PR to correct this.
For KMedoids
medois_indices_
works fine, namely every medoid is assigned to separate cluster, but if I use CLARA, I've got all medoids assigned to just one cluster. In example below, seez
(every medoid is assigned to cluster 1, (3,7,8,42,43 - those indicesmedoids I get).To reproduce issue, I use following input saved as csv :
https://gist.github.com/netj/8836201
with code:
sklearn_extra version 0.2.0 pandsas version 1.3.1 numpy version 1.22.1
Problem occurs also on much larger dataset (90k rows), but I can't share it, for which KMedoid is too slow.