open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

kmeans() exhibits non-deterministic behavior #61

Open mrjiaruiwang opened 9 years ago

mrjiaruiwang commented 9 years ago

I ran across something peculiar while attempting the homework. For the same exact adjacency matrix, I am seeing different behaviors for the output of kmeans(). It fails roughly half the time but then succeeds the other half. Here is my output for you to test out and see for yourself.

A =

 1     1     1     1     1     0
 1     1     1     1     1     1
 1     1     0     1     1     1
 1     1     1     1     1     1
 1     1     1     1     0     1
 0     1     1     1     1     1

??? Error using ==> kmeans>batchUpdate at 417 Empty cluster created at iteration 1.

Error in ==> kmeans at 320 converged = batchUpdate();

Error in ==> kmeans_fail at 11 c = kmeans(A,2);

jovo commented 9 years ago

do u know why this error is occurring? and why only sometimes?

On Saturday, February 7, 2015, Jerry Wang notifications@github.com wrote:

I ran across something peculiar while attempting the homework. For the same exact adjacency matrix, I am seeing different behaviors for the output of kmeans(). It fails roughly half the time but then succeeds the other half. Here is my output for you to test out and see for yourself.

A =

1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1

??? Error using ==> kmeans>batchUpdate at 417 Empty cluster created at iteration 1.

Error in ==> kmeans at 320 converged = batchUpdate();

Error in ==> kmeans_fail at 11 c = kmeans(A,2);

— Reply to this email directly or view it on GitHub https://github.com/Statistical-Connectomics-Sp15/intro/issues/61.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

mrjiaruiwang commented 9 years ago

Yes I suppose it is because kmeans uses a stochastic set of starting conditions that converges only some of the time. I tried looking at the source code but I don't really know what's going on there. Has anyone else looked at the code?

imichaelnorris commented 9 years ago

I ran it with Python thousands of times (it looks like you did it with Matlab) using the scipy.cluster.vq implemenation of kmeans. I'm not having any issues. I'm using the same matrix that you had.

I then ran the code in Matlab plenty of times and I have not gotten any issues.

I initialize the A matrix then run kmeans(A, 2), and nothing "bad" happens.

What language are you using and can you give a full script that produces the error?

mrjiaruiwang commented 9 years ago

If you try the following in my matlab version 2011a, you will get the error that I posted above.

A = [ 1 1 1 1 1 0; 1 1 1 1 1 1; 1 1 0 1 1 1; 1 1 1 1 1 1; 1 1 1 1 0 1; 0 1 1 1 1 1] for i = 1:100 kmeans(A,2) end