nsh87 / receptormarker

Source for 'receptormarker' package for R: antibody receptor and phenotypic marker analysis
http://receptormarker.com
BSD 2-Clause "Simplified" License
4 stars 9 forks source link

Kmeans iter.max not being respected in multi_clust() #33

Closed nsh87 closed 9 years ago

nsh87 commented 9 years ago

So the call to kmeans in multi_clust() is

kmm <- stats::kmeans(d, k, iter.max = iter.max, nstart = 10)
# multi_clust() makes iter.max = 300 by default

which is all good and should work properly, but for some reason kmeans is trying only 10 iterations:

fclust <- multi_clust(f, krange=2:20)
Warning messages:
1: did not converge in 10 iterations 
2: did not converge in 10 iterations 
3: did not converge in 10 iterations 
4: did not converge in 10 iterations 
5: did not converge in 10 iterations 
6: did not converge in 10 iterations 

No matter what I do to the arguments into multi_clust(), I cannot change kmeans from doing just 10 iterations. If I call kmeans separately, it works fine:

# Here's me using 3 iterations instead of 10
stats::kmeans(f, centers=20, iter.max=3, nstart=10)
Warning messages:
1: did not converge in 3 iterations 
2: did not converge in 3 iterations 
3: did not converge in 3 iterations 
4: did not converge in 3 iterations 
5: did not converge in 3 iterations 
6: did not converge in 3 iterations 
7: did not converge in 3 iterations 
8: did not converge in 3 iterations 
9: did not converge in 3 iterations 
10: did not converge in 3 iterations 
nsh87 commented 9 years ago

Ok, here's more.

I tried doing multi_clust() with iter.max = 300, 200, 100, 20 and they all got the same did not converge in 10 iterations warning. When I tried 11 I learned something:

> fclust <- multi_clust(f, krange=2:20, iter.max=11)
Warning messages:
1: did not converge in 11 iterations 
2: did not converge in 10 iterations 
3: did not converge in 10 iterations 
4: did not converge in 10 iterations 

It looks like the first run in nstart uses iter.max, but then the rest are using the default 10 maybe??

nsh87 commented 9 years ago

But why can I set iter.max to 3?

> fclust <- multi_clust(f, krange=2:20, iter.max=3)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: did not converge in 3 iterations
2: did not converge in 3 iterations
3: did not converge in 3 iterations
4: did not converge in 3 iterations
5: did not converge in 3 iterations
6: did not converge in 3 iterations
7: did not converge in 3 iterations
8: did not converge in 3 iterations
9: did not converge in 3 iterations
10: did not converge in 3 iterations
11: did not converge in 3 iterations
12: did not converge in 3 iterations
13: did not converge in 3 iterations
14: did not converge in 3 iterations
15: did not converge in 3 iterations
16: did not converge in 3 iterations
17: did not converge in 3 iterations
18: did not converge in 3 iterations
19: did not converge in 3 iterations
20: did not converge in 3 iterations
21: did not converge in 3 iterations
22: did not converge in 3 iterations
23: did not converge in 3 iterations
24: did not converge in 3 iterations
25: did not converge in 3 iterations
26: did not converge in 3 iterations
27: did not converge in 3 iterations
28: did not converge in 3 iterations
29: did not converge in 3 iterations
30: did not converge in 3 iterations
31: did not converge in 3 iterations
32: did not converge in 3 iterations
33: did not converge in 3 iterations
34: did not converge in 3 iterations
35: did not converge in 3 iterations
36: did not converge in 3 iterations
37: did not converge in 3 iterations
38: did not converge in 3 iterations
39: did not converge in 3 iterations
40: did not converge in 3 iterations
41: did not converge in 3 iterations
42: did not converge in 3 iterations
43: did not converge in 3 iterations
44: did not converge in 3 iterations
45: did not converge in 3 iterations
46: did not converge in 3 iterations
47: did not converge in 3 iterations
48: did not converge in 3 iterations
49: did not converge in 3 iterations
50: did not converge in 3 iterations
nsh87 commented 9 years ago

I think there is something weird happening by calling kmeans from within another function. I don't know what...

One possibility is that you looping over kmeans calls 10 times somehow is affecting iter.max. I bet if you just call kmeans once per krange the issue will go away (based on my test of iter.max = 11 above. I can try it later, let me know if you get to it first. If you're doing the same thing as the nstart option of kmeans then it would make sense to remove your runs loop anyway.

catterbu commented 9 years ago

@nsh87 I pushed up some code under the branch fix_kmeans. It is intended to fix this issue. I still got this warning popping up once, not a bunch of times. I could not figure out why. I examined the kmeans code extensively and cannot figure out what is going on. I ran a print statement through the for loop in multi_clust and iter.max remains constant. Same with runs. Please feel free to examine the code and see if you notice anything.