nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

Make _labels_cost public #155

Closed larroy closed 3 years ago

larroy commented 3 years ago

I would like to expose the functions to calculate label cost, I was using 0.10.2 and using _label_cost to get the distance from the cluster centers (similar to what "distortions" member does with scikit KMeans). This is to automate setting the number of cluster by using something similar to the elbow method. This change broke my code, and I think being able to use the same distance metric that was used for the computation is helpful. My proposal would be to make this method public so we can expose the distance metric with the same gamma.

nicodv commented 3 years ago

Not sure I get what you're requesting, @larroy . You can freely import and call the function using from kmodes.kprototypes import _label_cost. What change that broke your code are you referring to?

larroy commented 3 years ago

I think from 0.10 to 0.11 there was a change in how categorical and continous were stored as nested arrays in the cluster centers. _label_cost is a "protected" function in terms of pep8 would be good to have it renamed to label_cost so it's safe to export.

nicodv commented 3 years ago

It's good thought, as the function is commonly used post-model fitting.

Implemented in https://github.com/nicodv/kmodes/pull/156, among other things.