sanghoon / pva-faster-rcnn

Demo code for PVANet
https://arxiv.org/abs/1611.08588
Other
651 stars 241 forks source link

Fine-tuning from model trained with MS COCO dataset #21

Closed lzhangzz closed 7 years ago

lzhangzz commented 7 years ago

Hi,

I'm trying to train models based on the ImageNet pretrained model. To start with, I first trained a model using the settings from example_train_384 (and made the adjustments for 81 classes of coco) with coco_2014_train. The model is trained with 500k iterations and achieves 0.415 mAP@0.5 IoU on coco_2014_minival, which is reasonable for the limited number of iterations.

However, when I try to fine-tune the model on voc_2007_trainval with cls_score and bbox_pred re-initialized, the result is bad (less than 0.20 mAP). I have tried with base_lr set 0.001 or 0.0001, 100k or 200k iterations.

Can please you share how the fine-tuning is done after the model is trained with coco_2014_train (or coco_2014_trainval+voc_2007_trainval+voc_2012_trainval as in the paper)? I understand the model is different and just want to know how the fine-tuning is done correctly.

@sanghoon @kyehyeon

lzhangzz commented 7 years ago

It seems this is related to this issue. The snapshot function only handles bounding box regression layer with name 'bbox_pred' by default. Will try it again and post the results.

lzhangzz commented 7 years ago

After fixing the problem with 'bbox_pred' and modifying the hyper feature layers to have 512-channel output (so the network architecture is the same as PVANET), I managed to train a model on COCO+VOC from ImageNet pretrained model.

The mAP on VOC 2007 testing set is 78.5% and after finetuning on VOC 07+12 the mAP becomes 79.83. Still there is a noticeable gap from the full PVANET model provided.

I notice in #1 and #9, there are discussions about the mismatch in number of channels in fc6 between ImageNet pretrained model and the full PVANET model (384 vs 512). As in #9 @sanghoon suggests finetuning with re-initialized weights lower performance, I wonder how this problem is addressed in the training of full PVANET model. Is the hyper feature layers included in ImageNet pretraining so that the number of output channels of CNN is 512?

Thanks

sanghoon commented 7 years ago

Hi @lzhangzz,

For the training, I created a new model by manually merging the convolutional layers from PVANet and the fc6,fc7 layers from other network with 512 feature chns. However, the drop comes from un-initialized fc6 wouldn't be that big from my experience. In our new model, I've managed to train a better network even with re-initialized fc6.

Please note that there is a bug reported in #24 and it might affect the resulted mAP as well. I recommend you modify the prototxt and make "RPN_POST_NMS_TOP_N" to be 2000 while training.

lzhangzz commented 7 years ago

Hi @sanghoon,

Thanks for the info. After changing the RPN_POST_NMS_TOP_N to 2000, I managed to train a network from pretrained ImageNet weights to get 83% mAP on VOC 2007.

MyVanitar commented 7 years ago

@lzhangzz

I appreciate if you could be able to reply my questions sir. I want to use PVANET+ (Compressed) on a custom data-set.