sdemyanov / ConvNet

Convolutional Neural Networks for Matlab for classification and segmentation, including Invariang Backpropagation (IBP) and Adversarial Training (AT) algorithms. Trained on GPU, require cuDNN v5.
240 stars 141 forks source link

Will this support RGB input image as input? #12

Closed cteckwee closed 9 years ago

cteckwee commented 10 years ago

This is an excellent work for grayscale image. Will it support RGB input image?

sdemyanov commented 10 years ago

Yes, you just need to specify 3 outputmaps on the first layer and provide the array with the size 3 on the 3rd dimension.

On Wed, Aug 6, 2014 at 6:50 PM, cteckwee notifications@github.com wrote:

This is an excellent work for grayscale image. Will it support RGB input image?

— Reply to this email directly or view it on GitHub https://github.com/sdemyanov/ConvNet/issues/12.

cteckwee commented 10 years ago

Thanks for reply, I tried with 45x55 pixels color image for binary class problem. 'mexfun' option can run but the output is strange, it is always {0.4857,0.5143}. 'Matlab' option has execution error. I already set the input to 45x55x3x100000 where the last dimension is the number of training data. The parameter 'outputmaps' also set to 3. Am I missing something?

The training error rate 'trainerr' returned after training is about 0.25, however when I feed the training inputs again to the network [err, bad, pred] = cnntest(layers, weights, train_x, train_y, funtype), the 'err' is 0.45 which is far larger. I suppose the classifier should return 0.25?

cteckwee commented 10 years ago

Btw the error message when using 'matlab' option is as follows:

Error in forward (line 70) a(mapsize(1)+1:end, :, :, :) = mean(a((newsize(1)-1)*st(1)+1 : mapsize(1), :, :, :), 1);

sdemyanov commented 10 years ago

Have you normalized the input? The input values are recommended to be within a range [-1, 1]. How many samples in your test set? For how long do you train the net? Is the train error plot decreasing? The 'trainerr' is different from the 'err' in cnntest. First is the value of the loss function, while second is the ratio if incorrectly classified objects. They should not be the same.

On Fri, Aug 15, 2014 at 8:19 PM, cteckwee notifications@github.com wrote:

Thanks for reply, I tried with 45x55 pixels color image for binary class problem. 'mexfun' option can run but the output is strange, it is always {0.4857,0.5143}. 'Matlab' option has execution error. I already set the input to 45x55x3x100000 where the last dimension is the number of training data. The parameter 'outputmaps' also set to 3. Am I missing something?

The training error rate 'trainerr' returned after training is about 0.25, however when I feed the training inputs again to the network [err, bad, pred] = cnntest(layers, weights, train_x, train_y, funtype), the 'err' is 0.45 which is far larger. I suppose the classifier should return 0.25?

— Reply to this email directly or view it on GitHub https://github.com/sdemyanov/ConvNet/issues/12#issuecomment-52292593.

sdemyanov commented 10 years ago

Could you please send me a sample of your input for debugging?

On Fri, Aug 15, 2014 at 11:10 PM, cteckwee notifications@github.com wrote:

Btw the error message when using 'matlab' option is as follows:

Error in forward (line 70) a(mapsize(1)+1:end, :, :, :) = mean(a((newsize(1)-1)*st(1)+1 : mapsize(1), :, :, :), 1);

— Reply to this email directly or view it on GitHub https://github.com/sdemyanov/ConvNet/issues/12#issuecomment-52303551.

cteckwee commented 10 years ago

Hi I have converted my image using 'im2double' function so the input range should be [0,1]. The error rate. There are about 15000 training images (50% positive and 50% negative). I trained for 3 epoches but the test error rate is still as bad. For illustration, this is the first epoch training error rate plot: image

While the training error rate seems ok at 25%, the test error rate is 45%. The pred output of all test samples remain {0.4857,0.5143}. Is it because I do not use 'norm', norm_x, 'mean', mean_x, 'stdev', std_x) in the input layer?. I notice that for your given MNIST example, without using mean and stdev will give worse test result (8.74% test error rate vs 5.35%).

In your code you provided normalization and standard deviation for one channel image: train_x_norm = train_x; mean_s = mean(mean(train_x_norm, 1), 2); train_x_norm = train_x_norm - repmat(mean_s, [kXSize 1]); datanorm = sqrt(sum(sum(train_x_norm.^2, 1), 2)); norm_x = mean(squeeze(datanorm)); datanorm(datanorm < 1e-8) = 1; train_x_norm = train_x_norm ./ repmat(datanorm, [kXSize 1]) * norm_x; kMinVar = 1; mean_x = mean(train_x_norm, 3); std_x = sqrt(var(train_x_norm, 0, 3) + kMinVar);

Do you have the equivalent for 3 channel image?

Lastly, I will send you small portion of my data in separate email for you to debug. Thanks.

sdemyanov commented 10 years ago

If kXSize contains 3 dimensions (i.e. [45 55 3]), and kSampleDim = 4 then it is going to be something like:

mean_s = mean(mean(train_x, 1), 2); train_x_unbiased = train_x - repmat(mean_s, [kXSize(1:2) 1 1]); norm_x = mean(squeeze(sqrt(sum(sum(train_x_unbiased.^2))))); layers{1}.norm = norm_x; mean_x = mean(train_x, kSampleDim); layers{1}.mean = mean_x; train_x = train_x - repmat(train_x, [1 1 1 kTrainNum]); kMinVar = 1; std_x = sqrt(var(train_x, 0, kSampleDim) + kMinVar); layers{1}.stdev = std_x;

About the first question: I guess the reason is the small number of epochs. 45 x 55 is a large size, and it takes a long time for a neural net to converge. When I ran the winning solution on this dataset ( http://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), the error remained constant for at least 20 epochs. Only after 40 epochs it started to decrease. In this case I would advise you to scale your images or use another library, since current CPU implementation is not suitable for such long-time computations. I'm working on the GPU implementation, but I'm probably not going to finish within the next month.

Regards, Sergey.

On Mon, Aug 18, 2014 at 1:00 PM, cteckwee notifications@github.com wrote:

Hi I have converted my image using 'im2double' function so the input range should be [0,1]. The error rate. There are about 15000 training images (50% positive and 50% negative). I trained for 3 epoches but the test error rate is still as bad. For illustration, this is the first epoch training error rate plot: [image: image] https://cloud.githubusercontent.com/assets/8372411/3947104/03010774-2683-11e4-8967-3848bdc9a02d.png

While the training error rate seems ok at 25%, the test error rate is 45%. The pred output of all test samples remain {0.4857,0.5143}. Is it because I do not use 'norm', norm_x, 'mean', mean_x, 'stdev', std_x) in the input layer?. I notice that for your given MNIST example, without using mean and stdev will give worse test result (8.74% test error rate vs 5.35%).

In your code you provided normalization and standard deviation for one channel image: train_x_norm = train_x; mean_s = mean(mean(train_x_norm, 1), 2); train_x_norm = train_x_norm - repmat(mean_s, [kXSize 1]); datanorm = sqrt(sum(sum(train_x_norm.^2, 1), 2)); norm_x = mean(squeeze(datanorm)); datanorm(datanorm < 1e-8) = 1; train_x_norm = train_x_norm ./ repmat(datanorm, [kXSize 1]) * norm_x; kMinVar = 1; mean_x = mean(train_x_norm, 3); std_x = sqrt(var(train_x_norm, 0, 3) + kMinVar);

Do you have the equivalent for 3 channel image?

Lastly, I will send you small portion of my data in separate email for you to debug. Thanks.

— Reply to this email directly or view it on GitHub https://github.com/sdemyanov/ConvNet/issues/12#issuecomment-52446463.

cteckwee commented 10 years ago

I think my data is relatively simple. With 45x55 pixels and grayscale image, I managed to obtain good test accuracy after 3 epoches. However, I think the problem occurs when using color (i.e. 3-channel) images. When I set Kx = [45 55 3] and obtain norm_x, mean_x, std_x following your previous message method. My network is as such:

layers = { struct('type', 'i', 'mapsize', kXSize, 'outputmaps', 3, ... 'norm', norm_x, 'mean', mean_x, 'stdev', std_x)
struct('type', 'c', 'kernelsize', [5 5], 'outputmaps', 6) %convolution layer struct('type', 's', 'scale', [3 3], 'function', 'mean', 'stride', [2 2]) % subsampling layer struct('type', 'c', 'kernelsize', [5 5], 'outputmaps', 12, 'padding', [1 1]) %convolution layer struct('type', 's', 'scale', [3 3], 'function', 'max', 'stride', [2 2]) % subsampling layer
struct('type', 'f', 'length', 64) % fully connected layer struct('type', 'f', 'length', kOutputs, 'function', 'soft', ... 'dropout', dropout) % fully connected layer };

Errors occur as such: Error using genweights_mex Assertion Failed: The length of the norm vector is wrong

May I know have you successfully tested with 3-channel image? I believe the current version doesn't fully support 3 channel image.

sdemyanov commented 10 years ago

It definitely works for coloured images with just 'mean' and 'stdev'. I have done test on the CIFAR-10 dataset as a part of my research. There might be some problems with 'norm', I'll take a look tomorrow. Thanks for the feedback.

On Wed, Aug 20, 2014 at 1:59 PM, cteckwee notifications@github.com wrote:

I think my data is relatively simple. With 45x55 pixels and grayscale image, I managed to obtain good test accuracy after 3 epoches. However, I think the problem occurs when using color (i.e. 3-channel) images. When I set Kx = [45 55 3] and obtain norm_x, mean_x, std_x following your previous message method. My network is as such:

layers = { struct('type', 'i', 'mapsize', kXSize, 'outputmaps', 3, ...

'norm', norm_x, 'mean', mean_x, 'stdev', std_x)

struct('type', 'c', 'kernelsize', [5 5], 'outputmaps', 6) %convolution layer struct('type', 's', 'scale', [3 3], 'function', 'mean', 'stride', [2 2]) % subsampling layer struct('type', 'c', 'kernelsize', [5 5], 'outputmaps', 12, 'padding', [1 1]) %convolution layer struct('type', 's', 'scale', [3 3], 'function', 'max', 'stride', [2 2]) % subsampling layer

struct('type', 'f', 'length', 64) % fully connected layer struct('type', 'f', 'length', kOutputs, 'function', 'soft', ... 'dropout', dropout) % fully connected layer };

Errors occur as such: Error using genweights_mex Assertion Failed: The length of the norm vector is wrong

May I know have you successfully tested with 3-channel image? I believe the current version doesn't fully support 3 channel image.

— Reply to this email directly or view it on GitHub https://github.com/sdemyanov/ConvNet/issues/12#issuecomment-52731508.

sdemyanov commented 10 years ago

The new version has a refactored normalization and should work properly. Check out the example.

cteckwee commented 10 years ago

Thank you very much for your prompt response and efforts, I managed to run the code on 3-channel inputs and obtained very good result! For you cnnexamples.m, you may want to change the following line:

kXSize(kSampleDim) = []; to kXSize = kXSize(1:2); so that it is compatible with both 1 and 3-channel inputs?