soumith / imagenet-multiGPU.torch

an imagenet example in torch.
BSD 2-Clause "Simplified" License
402 stars 158 forks source link

SqueezeNet implementation #68

Closed windscope closed 8 years ago

windscope commented 8 years ago

Implemented the proposed SqueezeNet implementation. Tested on Imagenet dataset. However, the framework does not converge to the point as claimed in http://arxiv.org/abs/1602.07360. One assumption is that we are not using right LR and WeightDecay, since we have the similar issue training VGG, which also utilize 3x3 conv instead of larger ones. Directly followed the implementation on DeepScale open sourced Caffe model, with ResNet bypass module enhanced as described in paper Collaborate with Daniel Woodworth.

soumith commented 8 years ago

cool stuff!!!

szagoruyko commented 8 years ago

@windscope if you have a trained network, can you post it somewhere similar to https://gist.github.com/szagoruyko/0f5b4c5e2d2b18472854 ? would be very helpful

windscope commented 8 years ago

Wow, that was a fast merge! thank you @soumith!

Do you have any suggestion on training this model? I submitted this patch actually trying to ask for suggestion why this squeezenet does not even converge as claimed.We also implemented a simple training network which has the hierarchical batching implementation as described in opensource squeezenet, but that version suffer the same problem as this merged model.

As I said, we also have problem on training VGG, which has the same behavior as squeezenet, they converge at the very beginning. I noticed that they all utilize small filters (3x3 for VGG, 1x1 and 3x3 by SqueezeNet). We are not sure whether this is the cause or not.

Another thing need to be noticed is that our implementation is not a direct copy of DeepScale open sourced Caffe model. We implemented the simple ResNet enhanced model as proposed in the paper

windscope commented 8 years ago

Hi @szagoruyko, unfortunately, I cannot provide such a nice documentation since we are not able to actually trains the model to the point where it fully converged. In fact, in a 20 epoch training process on 60,000 images with 100 classes (a subset of imagenet, because we don't have enough storage and computation resource to fully train on imagenet), the top 1 accuracy converges at 1%. Do you have any suggestion on training this model?

culurciello commented 8 years ago

Am trying to train Squeezenet: th main.lua -data /media/SuperSSD/ILSVRC2012 -backend cudnn -netType squeezenet -nGPU 4 -nDonkeys 12 but am not having luck, it is not learning. What params did you guys use?