jhjin / kmeans-learning-torch

K-means feature learning on CIFAR-10 translated to Torch
22 stars 9 forks source link

Question about normalization step and hard or soft kmeans quantization #2

Closed MSardelich closed 8 years ago

MSardelich commented 8 years ago

Hi JIn,

In the Coates's paper, I see the advantage of normalizing the patches (3.1.1 section), i.e. "local brightness and contrast" normalization.

As far as I understood, you perform the normaliztion at lines 78 and 79 (run.lua). I copy the lines below:

   patches[i] = patches[i]:add(-patches[i]:mean())
   patches[i] = patches[i]:div(math.sqrt(patches[i]:var()+10))

However, I don't understand the +10 after the variance calculation. Isn't it expected to be a small number to avoid division by zero? For example:

patches[i] = patches[i]:div(math.sqrt(patches[i]:var())+0.01)

Another question is related to lines 107 to 116. At this stage, I would confirm if you are using soft kmeans assignment.

Lastly, I see you normalize each sample feature vector. Is this step part of Coates's paper? If positive, could you please refer to the page and paragraph?

Cheers, M.

jhjin commented 8 years ago

@MSardelich The addition to denominator is used to avoid zero division. The number 10 seems to be big, so the population of pixels may be in the range closes to zero after the normalization. However, the original code also uses 10 for the regularizer and, in practice, it did not harm discriminability of learned filters.

It does not use soft k means and only pick one centroid that has the smallest distance. The code for k means is: https://github.com/jhjin/kmeans-learning-torch/blob/master/kmeans.lua

This code is a translation of his MatLab code so I think it is a part of his work. The normalization method is described in 3.1.1 preprocessing in "An Analysis of Single-Layer Networks in Unsupervised Feature Learning".

MSardelich commented 8 years ago

Thanks @jhjin. I have to think a little bit more about the +10 effect (at fist sight I found it difficult to grasp).

Since your code is a translation of Matlab original code I understand that the last sample normalization step, although not explicitly mentioned in the paper (or at least I could not spot it), is part of the original implementation. In the paper I only see a normalization before the whitening process, but not after.

I would ask one more favor. Could you please confirm if you get ~65% accuracy (test set). It is less than the 77% reported in the original paper, but it is an outcome using soft k means.

Again, thanks for your prompt response! :-)

Please, feel free to turn it to a closed issue.

jhjin commented 8 years ago

@MSardelich Yes I don't exactly remember the highest accuracy I've got from the translated code (sorry that it was more than three year ago). I guess it was not high as 70%.