vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 752 forks source link

Constant misclassification in binary CNN. #391

Open Suryavf opened 8 years ago

Suryavf commented 8 years ago

Hello,

I am new to matconvnet and I am developing a binary CNN. I have no compilation errors, but the results are meaningless. Misclassification for evaluation remains constant at all epoch. In addition, misclassification for evaluation is always double epochs. I tested with different data but the result is the same.

What could be the cause?


%% Initlialize variables
run(fullfile('C:\library\matconvnet-1.0-beta18\matlab','vl_setupnn.m')) ;
numberSubject = 10;
 targetSuject = 2;

opts.numEpochs     = 30      ;
opts.learningRate  = 0.001   ;
opts.weightDecay   = 0.0005  ;
opts.momentum      = 0.9     ;
opts.batchSize     = 15      ;
opts.errorFunction = 'binary';
opts = vl_argparse(opts, varargin);

% Initlialize struct imdb
imdb = getImdb(numberSubject,targetSuject);

% Initlialize a new networkbatch
net = cnn_init(imdb) ;

% Call training function in MatConvNet

[net,info] = cnn_train(net, imdb, @getBatch, opts) ;

end

function [images, label] = getBatch( imdb, batch )
images = imdb.images.data(:,:,:,batch) ;
label  = imdb.images.label(1,batch) ;

end
function net = cnn_init(imdb)
% CNN_MNIST_LENET Initialize a CNN similar for MNIST

rng('default');
rng(0) ;

f=1/100 ;
imdbClass = getClass(imdb);

net.layers = {} ;
net.layers{end+1} = struct('type'   , 'conv', ...
                           'weights', {{f*randn(5,5,1,6, 'single'),...
                                             zeros(1, 6, 'single')}}, ...
                           'stride' , 1, ...
                           'pad'    , 0) ;
net.layers{end+1} = struct('type'   , 'pool', ...
                           'method' , 'avg', ...
                           'pool'   , [2 2], ...
                           'stride' , 2, ...
                           'pad'    , 0) ;
net.layers{end+1} = struct('type'   , 'conv', ...
                           'weights', {{f*randn(5,5,6,6, 'single'),...
                                               zeros(1,6,'single')}}, ...
                           'stride' , 1, ...
                           'pad'    , 0) ;
net.layers{end+1} = struct('type'   , 'pool', ...
                           'method' , 'avg', ...
                           'pool'   , [2 2], ...
                           'stride' , 2, ...
                           'pad'    , 0) ;
net.layers{end+1} = struct('type'   , 'conv', ...
                           'weights', {{f*randn(13,37,6,2, 'single'),...
                                               zeros(1,2,'single')}}, ...
                           'stride' , 1, ...
                           'pad'    , 0) ;
net.layers{end+1} = struct('type'   , 'softmax') ;
net.layers{end+1} = struct('type'   , 'loss',...
                           'class'  , [1 -1]) ;

% Meta parameters
net.meta.inputSize = [64 160 1] ;

% Fill in defaul values
net = vl_simplenn_tidy(net) ;

end
lenck commented 8 years ago

Hi, for binary classification, what is the simplest is just to have two classes in softmax. In your case you are however doing something completely different:

net.layers{end+1} = struct('type'   , 'softmax') ;
net.layers{end+1} = struct('type'   , 'loss',...
                           'class'  , [1 -1]) ;

You are effectively stacking softmax and then on top of that a softmaxlogloss (as that is the default configuration of the loss layer). So just remove the softmax layer.

The field class is then set up in cnn_train and is the GT label (see also vl_simplenn).

So, basically just remove the softmax and the class field from the loss layer. Also, the binary error probably wouldn't work because it expects labels [-1, +1] whereas for softmaxlogloss you need [1, 2]. In a similar way you probably need to adjust the getBatch function so it also returns labels [1, 2]...

Suryavf commented 8 years ago

Thanks for your help, I tasted your suggestions but I have not succeeded. The misclassification remains constant.

Training parameters I used: Learning rate: 0.001 Weight decay: 0.0005 Momentum: 0.9

The result obtained with epoch 30: http://s22.postimg.org/x18j9yo2p/image.png

The result obtained with epoch 3: http://s27.postimg.org/95cj612ab/image.png

lenck commented 8 years ago

Welcome to deep learning, nothing ever works for the first time ;)

The performance of the model depends on large amount of things - mainly amount of 'original' training data (note that the MNIST model, which is one of the smallest deep models is trained on 60 000 examples with 10 classes), so unless you have at least similar amount of data it is really hard to achieve improvements compare to the traditional machine learning formulations when training from scratch (e.g. smoothness of the manifolds etc.).

What sort of data are you feeding it with? In general the architecture is a bit strange (projecting only to 6-dimensional space?, what sort of invariances do you expect in your data? Do you really assume that spatial invariance is only up to patches of size ~5px? Are you sure that a single fully connected layer would be able to be spatially invariant enough to a grid of size 13 x 37 when a much larger larger network trained on 1e6 images do have cca 13x13 FC grid with 3 fully connected layers?). These are all really difficult decisions which one has to make in order to create a new working architecture.

What is in general much better idea, especially if you start with CNNs is to use an existing network and start to fine-tune it. In this sense you can also find a much easier baseline when e.g. training a linear classifier on top of the extracted features from an existing network. This gradual approach also helps a lot to be able to get the feeling for the 'dimensionality' of the problem, which is needed to correctly pick the number of projections per layer, spatial sizes etc.

So, in this regard, I can only wish a good luck in the search for the right hype-parameters! :)

Suryavf commented 8 years ago

I'm working on developing a biometric system based on EEG. As a first step, I'm replicating the work of Lan Ma ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7318985

I use the network topology developed by Lan Ma. I should have the same result. The network is trained with 50 samples per class and is evaluated with 5 samples per class. It is a total of 500 training samples and 50 samples for evaluation.

You think that's enough?

dbparedes commented 8 years ago

@Suryavf Did you solve this problem? I have been trying binary classification but I got errors much higher than 1 like in your case. It seems to be that there is something else to modify.

Almonfrey commented 8 years ago

@dbparedes may this can help you... https://github.com/vlfeat/matconvnet/issues/48