joshlk / k-means-constrained

K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained
https://github.com/joshlk/k-means-constrained
BSD 3-Clause "New" or "Revised" License
192 stars 43 forks source link

[How to classify the new instances after obtaining a constrained clustering] #38

Closed Strawberry9583 closed 1 year ago

Strawberry9583 commented 1 year ago

Hi, there.

I want to use the constrained k-means for clustering instances, but these instances are divided into two parts (Let's say instance set I1, I2). After I obtain the clustering result with the I1, I want to obtain the labels of I2 according to the obtained clustering from I1. I can not use like this:

clf.fit(I1)
clf.predict(I2)

Because the constraints of the number of instances in each cluster in the process of fitting will be used in the prediction process.

For example, I1 has 2, 000 instances, and I2 has 500 instances. If I set min instance number (50), max instance number (200) and number of clusters (30) as constrains in clf.fit(I1), I will have errors reported that the number of instances should be more than minnumber of clusters in the process in clf.predict(I2). That is number of instances in prediction should be more than ```5030 =1500``, but I only have500instances inI2`.

protco commented 1 year ago

It seems for the predict function, size_min and size_max are optional, so this should work: model.predict(X, size_min=None, size_max=None).