I want to use the constrained k-means for clustering instances, but these instances are divided into two parts (Let's say instance set I1, I2). After I obtain the clustering result with the I1, I want to obtain the labels of I2 according to the obtained clustering from I1. I can not use like this:
clf.fit(I1)
clf.predict(I2)
Because the constraints of the number of instances in each cluster in the process of fitting will be used in the prediction process.
For example, I1 has 2, 000 instances, and I2 has 500 instances. If I set min instance number (50), max instance number (200) and number of clusters (30) as constrains in clf.fit(I1), I will have errors reported that the number of instances should be more than minnumber of clusters in the process in clf.predict(I2). That is number of instances in prediction should be more than ```5030 =1500``, but I only have500instances inI2`.
Hi, there.
I want to use the constrained k-means for clustering instances, but these instances are divided into two parts (Let's say instance set
I1, I2
). After I obtain the clustering result with theI1
, I want to obtain the labels ofI2
according to the obtained clustering from I1. I can not use like this:Because the constraints of the number of instances in each cluster in the process of fitting will be used in the prediction process.
For example,
I1
has2, 000
instances, andI2
has500
instances. If I set min instance number(50)
, max instance number(200)
and number of clusters(30)
as constrains inclf.fit(I1)
, I will have errors reported that the number of instances should be more than minnumber of clusters in the process inclf.predict(I2)
. That is number of instances in prediction should be more than ```5030 =1500``, but I only have
500instances in
I2`.