koraykv / unsup

Some unsupervised learning modules using Torch
86 stars 36 forks source link

kmeans gives wrong counts #40

Open sunshineatnoon opened 8 years ago

sunshineatnoon commented 8 years ago

I have a tensor with dimension: 204x4096 where 204 is the number of samples while 4096 is feature dimension. After 10000 iterations of clustering by kmeans, it returns a counts like this: [1919809, 60009, 60013, 0, 169]. This is definitely wrong since I only have 204 samples. The screen shot is below:

And here is my code, could anyone please tell where I did wrong?

grams = torch.load('grams.t7')
grams = grams:double()

centroids, counts = unsup.kmeans(grams, 5, 10000, nil, nil, true)
Conchylicultor commented 7 years ago

Yes, I had some pb too with the kmean implementation. I you look at the code, they try to accumulate the count values accross the iterations. Don't really understand why. https://github.com/koraykv/unsup/blob/master/kmeans.lua#L94