Closed windscope closed 8 years ago
cool stuff!!!
@windscope if you have a trained network, can you post it somewhere similar to https://gist.github.com/szagoruyko/0f5b4c5e2d2b18472854 ? would be very helpful
Wow, that was a fast merge! thank you @soumith!
Do you have any suggestion on training this model? I submitted this patch actually trying to ask for suggestion why this squeezenet does not even converge as claimed.We also implemented a simple training network which has the hierarchical batching implementation as described in opensource squeezenet, but that version suffer the same problem as this merged model.
As I said, we also have problem on training VGG, which has the same behavior as squeezenet, they converge at the very beginning. I noticed that they all utilize small filters (3x3 for VGG, 1x1 and 3x3 by SqueezeNet). We are not sure whether this is the cause or not.
Another thing need to be noticed is that our implementation is not a direct copy of DeepScale open sourced Caffe model. We implemented the simple ResNet enhanced model as proposed in the paper
Hi @szagoruyko, unfortunately, I cannot provide such a nice documentation since we are not able to actually trains the model to the point where it fully converged. In fact, in a 20 epoch training process on 60,000 images with 100 classes (a subset of imagenet, because we don't have enough storage and computation resource to fully train on imagenet), the top 1 accuracy converges at 1%. Do you have any suggestion on training this model?
Am trying to train Squeezenet: th main.lua -data /media/SuperSSD/ILSVRC2012 -backend cudnn -netType squeezenet -nGPU 4 -nDonkeys 12 but am not having luck, it is not learning. What params did you guys use?
Implemented the proposed SqueezeNet implementation. Tested on Imagenet dataset. However, the framework does not converge to the point as claimed in http://arxiv.org/abs/1602.07360. One assumption is that we are not using right LR and WeightDecay, since we have the similar issue training VGG, which also utilize 3x3 conv instead of larger ones. Directly followed the implementation on DeepScale open sourced Caffe model, with ResNet bypass module enhanced as described in paper Collaborate with Daniel Woodworth.