joshlk / k-means-constrained

K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained
https://github.com/joshlk/k-means-constrained
BSD 3-Clause "New" or "Revised" License
192 stars 43 forks source link

Wrong clustering #34

Closed mosikh closed 2 years ago

mosikh commented 2 years ago

Hello, I have a bunch of house points. I have extended a line which extends Glass fiber to each house. Now, I'd like to cluster the points to assign a distributor to each cluster. The maximum of house points per cluster should be 20. I calculated an adjacency matrix based on the Glass fiber line for the data set which has 61 points, here as an example. I do the clustering by this library and the pre-computed adjacency matrix. However, I sometimes see a wrong clustering, which is observable in the picture.

It is my code: "am" is the adjacency matrix of distances db = KMeansConstrained(n_clusters = 4,size_max=20, random_state=0) result = db.fit_predict(am)

In the picture, the black line is the glass fiber line which is the base of calculation and the colorful points are the my clustered points by the algorithm. The green and yellow clusters are not in the best state, as you see. I have sometimes the same issue with other datasets, as well.

kmeans constrained

I appreciate any help to improve the result.

Versions:

Best regards, Mostafa

joshlk commented 2 years ago

You need to provide a minimum, reproducing example otherwise it’s not possible to debug this issue. Closing ticket until more information is provided.