guillaume-chevalier / python-caffe-custom-cifar-100-conv-net

Custom convolutional neural network on cifar-100 dataset for image classification. Images and their labels are processed to HDF5 data format for use in Caffe.
https://www.neuraxio.com/
10 stars 9 forks source link

How can we do some preprocessing like mean-subtraction in this dataset? #1

Closed Coderx7 closed 8 years ago

Coderx7 commented 8 years ago

I'm not a python developer, but I guess doing mean-subtraction would improve performance and would be a nice add on as well. So I tried to implement such a feature myself, but I'm having a little bit of difficulty here. basically I thought a simple mean subtraction method would do the trick something like (copied from internet ;) ) :

    def preprocess(im,mean):
        im = np.float32(im)
        im = im[:, :, ::-1]  # change to BGR
        im -= mean
        #im *= scale
        im = im.transpose((2, 0, 1))
        return im 

# and this can be used in the load method like this : 
        data = d['data']
        for row in data:
            data[row] = preprocess(data[row],mean)

There are two (major) issues (and one minor one ) here, first of all I dont know whether this works in python, and second If this is the correct way of doing it( probably not!, there should be a more performant way of doing things like this, like a fully vectorized version?! the other issue is the reshaping. the data are laid out in R,G,B fashion each having 1024 values. I dont know how I can reshape them back to BGR and then use the preprocessing. and one last thing, the mean that I have , has been generated using the c++ mean calculator that Caffe provides and I used the binary version of the dataset to get that mean, will this be Ok? or not? So any help is greatly appreciated. Your work helped me alot so far thanks for that:)

guillaume-chevalier commented 8 years ago

In Caffe

From what I know, there is a layer for doing that. You could try Caffe's Mean-Variance Normalization (MVN) layer: http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1MVNLayer.html

The description of this layer is Normalizes the input to have 0-mean and/or unit (1) variance, so this seems to be what you are searching for considering you are subtracting a mean and then scaling out values. I guess this layer uses the graphic card, however, it will be applied at every training step on the images rather than once before.

Probably that you will be interested in Caffe's Local Response Normalization (LRN) layer, but this is not quite what you are searching for. Caffe layers are somehow described in the layer catalog: http://caffe.berkeleyvision.org/tutorial/layers.html

Regarding the RGB to BRG conversion, I do not think you need to properly arrange the order of RGB in your neural network input as long as every training image has the same RGB color order. Normally with my preprocessing of the dataset, the dimensions are already correct for Caffe (no Reshape transpose layer is needed). I never really cared about the inner ordering of the one specific RGB dimension as anyway a typical convolutional neural network learns locally, relatively, and scraps the ordering of colors from the very first convolutional layer.

In Python

The only downside to do the normalization conversions and reshaping in Caffe would be that the data would not be preprocessed once, but at every training (and probably multiple times per training considering you should loop trough your data multiple times with a dataset of this size).

If you want to preprocess completely the data rather than using Caffe layers, you could edit the load_data function in my preprocessing repo to make it call another function at some point: https://github.com/guillaume-chevalier/caffe-cifar-10-and-cifar-100-datasets-preprocessed-to-HDF5

At a quick glance, your Python code would not work and contains logic errors considering that my preprocessing of the data converts it to a 3*32*32 shape (3D array). In this way, the outer dimension is the colors, so to swap them you could do this in python:

# "im" variable is RGB
im_BRG = im[2], im[0], im[1]
# Note this was NOT a transposition, as the dimensions 
# stays ordered the same in the np array.

# For the mean normalisation, you could do: 
mean = np.mean(im_BRG)
im_BRG -= mean

And then use some scikit-learn functions to complete with the scaling adjustments. I don't remember if data was scaled 0-1 or 0-255 in my preprocessing, but you could always extract the mean variance with some other method, then divide im_BRG by it to get a good scale.

I hope this helps you!

Coderx7 commented 8 years ago

Thank you very much:) I really appreciate your kind help ;)