vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 753 forks source link

Larger than expected validation errors when training Alexnet without the two normalization layers #489

Open shehrzad opened 8 years ago

shehrzad commented 8 years ago

I've been using beta18 of matconvnet to train a variant of AlexNet from scratch, where I've removed the two normalization layers from the network. My problem is that the validation error stays really high as compared to the canonical AlexNet results reported here.

A typical training plot I get:

net-train.pdf

It seems like my validation error rates plateau and the the solver never drives them down (although training error continues to decrease). I experimented with different data augmentation strategies. As evidenced by the dip between epoch 13 and 14 'f25' augmentation did appear to have some impact. I also added a dropout regularization layer at the end of the network.

A few notes about the ImageNet data:

Any thoughts here on what I can do to improve the validation error rates? I adapted cnn_imagenet.m to perform my training, selected pertinent code snippets appear below:

opts.dataDir = fullfile(vl_rootnn, 'data','ILSVRC2012') ;
opts.modelType = 'alexnet' ;
opts.networkType = 'simplenn' ;
opts.batchNormalization = true ;
opts.weightInitMethod = 'gaussian' ;
[opts, varargin] = vl_argparse(opts, varargin) ;

sfx = opts.modelType ;
if opts.batchNormalization, sfx = [sfx '-bnorm'] ; end
sfx = [sfx '-' opts.networkType] ;
opts.expDir = fullfile(vl_rootnn, 'data', ['imagenet12-' sfx]) ;
[opts, varargin] = vl_argparse(opts, varargin) ;

**opts.contrastNormalization = true;**
opts.numFetchThreads = 12 ;
opts.lite = false ;
opts.imdbPath = fullfile(opts.expDir, 'imdb.mat');
opts.train = struct() ;
opts = vl_argparse(opts, varargin) ;
if ~isfield(opts.train, 'gpus'), opts.train.gpus = []; end;

Prior to kicking off the training, I modify the network struct as so:

net.layers{end+1} = struct('type','dropout','rate',0.5);
net.meta.augmentation.transformation = 'f25';

And really, not much is different from the code as it stands in git today:

net.meta.trainOpts.numEpochs = 21;
net.meta.trainOpts.batchSize = 100;
net.meta.trainOpts.learningRate = 0.001;
net.meta.trainOpts.continue = true;
net.meta.trainOpts.expDir = 'data/alexnet-nonorm';
opts.train.gpus = [1];
[net, info] = trainFn(net, imdb, getBatchFn(opts, net.meta), ...
                      'expDir', opts.expDir, ...
                      net.meta.trainOpts, ...
                      opts.train) ;

Any ideas?

bazilas commented 8 years ago

Regarding the data preparation, there is a script to pre-process the data ("preprocess-imagenet.sh"). You could also first train the default AlexNet without batch normalization and see if your results are close to the expected ones.