When using kmodes in Python, the algorithm chooses the best distribution of the data for a set number of clusters, regardless of the group size of each cluster. E.g.: some clusters can have around 10.000 data points while others have around 300.
For my project, I'd like to find an equal-sized clustering of all my data, where each cluster is constrained .
I've found an algorithm for constraining group size with a kmeans model, but since I'm working with categorical data, I need a solution for k-modes (or k-prototype).
Is there any solution or workaround of how I can get equal-sized clusters?
Interesting, @Jhellewaard , but this is quite a specific ask that I will very likely not put on the kmodes roadmap. Happy to take contributions, as always. :)
When using kmodes in Python, the algorithm chooses the best distribution of the data for a set number of clusters, regardless of the group size of each cluster. E.g.: some clusters can have around 10.000 data points while others have around 300.
For my project, I'd like to find an equal-sized clustering of all my data, where each cluster is constrained .
I've found an algorithm for constraining group size with a kmeans model, but since I'm working with categorical data, I need a solution for k-modes (or k-prototype).
Is there any solution or workaround of how I can get equal-sized clusters?