soumith / imagenet-multiGPU.torch

an imagenet example in torch.
BSD 2-Clause "Simplified" License
401 stars 158 forks source link

low accuracy on alexnetowtbn #55

Open mrastegari opened 8 years ago

mrastegari commented 8 years ago

I trained the Alexnet model with batch normalization (alexnetowtbn) with 4 GPU and batchSize 256. after 50 epochs my top-1 acuracy is %45 . I couldn't find any result of alexnet trained with batchnormalization. Is this number ok? It seems much lower than 57% which is reported in caffe.

szagoruyko commented 8 years ago

I trained one some time ago, the model is here https://gist.github.com/szagoruyko/dd032c529048492630fc, achieves 56.7% top1.

mrastegari commented 8 years ago

This model is different with one in this repository (alexnowtbn does not have nn.Concat and number of filters in convolutinal layers are different with your model) do you think we should expect this gap?

szagoruyko commented 8 years ago

@mrastegari no that shouldnt be the issue. my bet would be the recent bugs in DPT, you might want to update everything and try again. btw you can increase the learning rate and half the number of epochs.

mrastegari commented 8 years ago

I updated all the libraries (cunn, nn, cudnn, cutorch) but yet I can not get the top-1 accuracy more than 45%.

cxy7452 commented 8 years ago

yes I'm also getting a similar issue, alexnetowtbn is giving me low accuracy, trying to train with -netType alexnet to see if at least alexnet gives good performances...

mrastegari commented 8 years ago

I remember around two months ago I could get top-1(val) accuracy around 52% . So maybe something changed in some of the updates in the libraries.

cxy7452 commented 8 years ago

hmm, tried it again and now alexnetowtbn converges fine, got to 38th epoch and the top-1 validation accuracy is at 53.93%.

mrastegari commented 8 years ago

Have you followed the learningReate regime exactly in the same way as in the code? I noticed some instability in training. For example, after one epoch if I stop and then call the retrain option it gives better accuracy than just let the code goes to the next epoch. Have you reinstall any of the libraries?

cxy7452 commented 8 years ago

hmm, I've updated torch, nn, cutorch, and cudnn. But my version of imagenet-multiGPU was from a few months ago, I've just cloned the new version and just began a training of alexnetowbn to see if I can duplicate the results.

cxy7452 commented 8 years ago

alexnetowbn trained and converged fine, btw.

mrastegari commented 8 years ago

Thanks for the effort !!!

Viresh-R commented 8 years ago

Hey Guys, do you have any updated results ? I trained Alexnet (without batch normalization), and I get top-1 accuracy of 54.93 on val set.