nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

Create equal-sized clusters within kmodes #195

Closed Jhellewaard closed 1 year ago

Jhellewaard commented 1 year ago

When using kmodes in Python, the algorithm chooses the best distribution of the data for a set number of clusters, regardless of the group size of each cluster. E.g.: some clusters can have around 10.000 data points while others have around 300.

For my project, I'd like to find an equal-sized clustering of all my data, where each cluster is constrained .

I've found an algorithm for constraining group size with a kmeans model, but since I'm working with categorical data, I need a solution for k-modes (or k-prototype).

Is there any solution or workaround of how I can get equal-sized clusters?

nicodv commented 1 year ago

Interesting, @Jhellewaard , but this is quite a specific ask that I will very likely not put on the kmodes roadmap. Happy to take contributions, as always. :)