Imagenet trained model for SSD

weiliu89 / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

4.77k stars 1.67k forks source link

Imagenet trained model for SSD #117

Open kristellmarisse opened 8 years ago

kristellmarisse commented 8 years ago

Has anyone successfully trained Imagenet dataset on SSD (VGGnet or resnet) . If so please share the mAP and the number of categories (out of 1000) trained.

weiliu89 commented 8 years ago

You mean for the localization (LOC) task? I am curious to know as well. Besides, has anyone reproduce the ResNet + Faster R-CNN result on ILSVRC DET task? I am curious to see the "single model" performance of it as well.

abhisheksgumadi commented 8 years ago

Hi, is there a written tutorial anywhere on how to train SSD on a set of annotated images that have been taken from ImageNet? I would like to train on my own training data. Any pointers would be helpful. What files should be changed?

XiongweiWu commented 8 years ago

@weiliu89 I am training ResNet-101 + Faster RCNN, after 90K iterations the mAP is 39.1%, which is far from the single model baseline(53%). BTW, to run SSD on ILSVRC2016, two images need to be discarded in my side: ILSVRC2012_val_00033658 and ILSVRC2013_val_00004542.

Update: mAP after 220K iterations of Faster RCNN improves to 44.3%

XiongweiWu commented 8 years ago

One more thing, can you provide script of ssd_ilsvrc_vgg_500.py? Now I get it by modifying ssd_pascal_500.py, where I changed the initial lr to 0.0001(original setting cause gradient exploding), and iterates 200K totally. However the mAP is around 15% after 100K iterations, which is far from the model you provide(46.3%), I guess it's the learning parameter issues

weiliu89 commented 8 years ago

@XiongweiWu Thanks for the information! I am curious where do you get the single model accuracy for ResNet101 + Faster RCNN? I couldn't find it in the ResNet paper.

If you have gradient explosion issue in the beginning, you should lower your initial learning rate. But after it is stable (couple hundred or thousand iterations), you should cancel the training and resume training with larger lr (e.g. 0.001). Besides, 200k is not enough to converge on ILSVRC... SSD500 should have 30ish mAP after 100k iterations on val2. I am busy with ILSVRC2016 now and thus probably won't have time to provide the script.

XiongweiWu commented 8 years ago

@weiliu89 Thx for your information too! For single model of ResNet based on faster rcnn, I remember one slide of ILSVRC2015 introduce this winner method, including mAP gain of each tricks(multi-scale testing, ensemble, bbox iteration refinement etc.), and thus, you can get the mAP of one single model without any tricks, I remember it's 52-53% or so

weiliu89 commented 8 years ago

@XiongweiWu I still couldn't find the 52-53% number from ILSVRC2015 slide. It might be true that those tricks increase 5-6% on PASCAL VOC. I am not sure how much it transfer to ILSVRC. Regardless, I am just personally curious to see what is the single model accuracy for Faster R-CNN with VGGNet or ResNet.

XiongweiWu commented 8 years ago

@weiliu89 I will search for the slide. Currently I get 46.7% mAP by res-101-iter-250k-faster-rcnn model.

gezhiwei commented 7 years ago

This is the slide from faster rcnn. use resnet101 single produce 58.85% mAP. @weiliu89

weiliu89 commented 7 years ago

@gezhiwei Does the single model contain many tricks (e.g. multi-scale testing, etc.)? It is only meaningful to compare the pure single model accuracy.

gezhiwei commented 7 years ago

@weiliu89 yes, I think it includes box refinement, context and multi-scale tesing.