nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.24k stars 417 forks source link

Distance to the nearest cluster #146

Closed tinkuborah closed 4 years ago

tinkuborah commented 4 years ago

Thanks for developing this module. It's been a great help. Is it possible to get the distance of every sample to all the clusters. Logically the nearest cluster will be the one to which the sample will get assigned eventually. I have a requirement in which , when a user enters a sample , instead of assigning a label to the sample, the model should print how similar the sample is to all the clusters. This can be the distance of the sample from all the clusters.

nicodv commented 4 years ago

A trained kmodes instance will have the .cluster_centroids_ attribute. If you use that in combination with the dissimilarity function of your choice, you can determine the distance from a point to all centroids quite easily: dissim(centroids, curpoint)