Closed Arsalan-Vosough closed 3 years ago
Hi, Great to hear that your using it 😀.
Can you please provide a minimal working example. Thanks, Josh
Longitude Latitude 0 0.143799 0.549696 1 0.748523 0.666809 2 0.893091 0.485969 3 0.633522 0.273117 4 0.691772 0.763385 5 0.671481 0.112269 6 0.250957 0.781550 7 0.199018 0.798926 8 0.680017 0.201779 9 0.270592 0.461235 10 0.648789 0.140139 11 0.417517 0.114667 12 0.733276 0.254028 13 0.283617 0.515177 14 0.256486 0.788757 15 0.369168 0.380070 16 0.265186 0.596243 17 0.356121 0.442192 18 0.651694 0.876345 19 0.166674 0.829551 20 0.623306 0.034364 21 0.250798 0.911847 22 0.448605 0.517670 23 0.529576 0.000000 24 0.622372 0.215839 25 0.492679 0.621276 26 0.349826 0.242467 27 0.561980 0.855117 28 0.543573 1.000000 29 0.000000 0.572787 30 0.285501 0.358724 31 0.398475 0.106590 32 1.000000 0.452500 33 0.367203 0.419650 34 0.672594 0.257735 35 0.590781 0.022893 36 0.459228 0.146675 37 0.480092 0.666456 38 0.451271 0.225341 39 0.767639 0.395854 40 0.702797 0.589130
this is my data, I normalized it with minmax scaler. and used this function in order to clustering :
def k_means_cons(k,minVal,maxVal,data):
clf = KMeansConstrained(
n_clusters=k,
size_min=minVal,
size_max=maxVal,
random_state=0,max_iter = 300)
clf.fit(data)
clf.cluster_centers_
Label = clf.predict(data)
return Label
Label = k_means_cons(10,3,5,normalized)
and it returns me:
array([0, 9, 7, 1, 9, 8, 5, 5, 1, 4, 8, 3, 1, 4, 5, 4, 0, 4, 6, 5, 8, 5, 2, 8, 1, 2, 3, 6, 6, 0, 4, 3, 7, 4, 1, 8, 3, 2, 3, 7, 9])
as you can see there are 6 elements in 4th cluster
Thanks. What exact normalisation did you use?
what sklearn and ortools version are you also using?
minmax_scale = preprocessing.MinMaxScaler(feature_range=(0,1))
scaled_feature = minmax_scale.fit_transform(data)
sklearn version is 0.23.2 and ortools version is 8.1.8487
i think, I made it complex. Briefly if you run the code below, sometimes it gives you cluster with more than max_size
def generatedb(numberPatient):
patient=[]
i = 0
while len(patient)<=numberPatient:
x = random.uniform(51.078418,51.701563)
y = random.uniform(35.514715,35.901148)
if y < -0.722386*x+72.866:
if y > -0.7184706*x+72.576:
if y > 0.692935*x+0.0551:
if y<0.549044*x+7.5495:
patient.append((x,y))
i=i +1
dataWithcolName = pd.DataFrame(patient,columns=['Longitude', 'Latitude'])
return(dataWithcolName)
def k_means_cons(k,minVal,maxVal,data):
clf = KMeansConstrained(
n_clusters=k,
size_min=minVal,
size_max=maxVal,
random_state=0,max_iter = 300)
clf.fit(data)
clf.cluster_centers_
Label = clf.predict(data)
return Label
data0 = generatedb(38)
Label = k_means_cons(10,3,5,normalized)
Label
array([7, 9, 6, 0, 1, 0, 1, 3, 0, 5, 2, 5, 7, 2, 4, 0, 3, 5, 4, 8, 7, 0, 6, 6, 7, 4, 9, 1, 9, 4, 1, 8, 3, 9, 5, 7, 2, 1, 2, 4, 2, 8, 5, 7])
Hi,
I determined what the issue is and it's my fault as the example on the front page of this project is wrong. So thank you for raising the issue.
So you need to use the method fit_predict
instead of fit
and then predict
. This is because predict
assigns clusters to the nearest centre without obeying the min and max constrains. While fit_predict
does obey the constrains, you can also access the assigned labels using the labels_
attribute after a fit
. Like I said, on the front page of this project I use fit
and then predict
and so this wasn't communicated properly by myself.
Currently, I would say, the predict
function does not meet expectations and therefore I have changed it in the latest version so it does obey the obeying the min and max constrains. Therefore if you update to the latest version (v0.5.0) which is on PyPI it should now work.
Thanks again for reporting this, Josh
Hi,
I used fit_predict
and it worked.
Thanks for your quick response.
Hi
Thank you for sharing your code! I used it to cluster my data in 10 cluster with min_size = 3 and max_size = 5. But it returns some clusters with more than max size elements unfortunately. it gives me a cluster with 7 elements sometimes.