rbgirshick / py-faster-rcnn

Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
Other
8.1k stars 4.11k forks source link

SqueezeNet implementation for Faster RCNN #345

Open mengzhangjian opened 8 years ago

mengzhangjian commented 8 years ago

@siddharthm83 am trying to use SqueezeNet with Faster RCNN.The following is my training prototxt.I modify the roi_pool layer:pooled_w:13,pooled_h:13.And the training process has no mistakes. While the detection result is very bad,can anyone help me solve this? name: "VGG_ILSVRC_16_layers" layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 2" } }

layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 96 kernel_size: 7 stride: 2 weight_filler { type: "xavier" } } } layer { name: "relu_conv1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fire2/squeeze1x1" type: "Convolution" bottom: "pool1" top: "fire2/squeeze1x1" convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire2/relu_squeeze1x1" type: "ReLU" bottom: "fire2/squeeze1x1" top: "fire2/squeeze1x1" } layer { name: "fire2/expand1x1" type: "Convolution" bottom: "fire2/squeeze1x1" top: "fire2/expand1x1" convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire2/relu_expand1x1" type: "ReLU" bottom: "fire2/expand1x1" top: "fire2/expand1x1" } layer { name: "fire2/expand3x3" type: "Convolution" bottom: "fire2/squeeze1x1" top: "fire2/expand3x3" convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire2/relu_expand3x3" type: "ReLU" bottom: "fire2/expand3x3" top: "fire2/expand3x3" } layer { name: "fire2/concat" type: "Concat" bottom: "fire2/expand1x1" bottom: "fire2/expand3x3" top: "fire2/concat" } layer { name: "fire3/squeeze1x1" type: "Convolution" bottom: "fire2/concat" top: "fire3/squeeze1x1" convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire3/relu_squeeze1x1" type: "ReLU" bottom: "fire3/squeeze1x1" top: "fire3/squeeze1x1" } layer { name: "fire3/expand1x1" type: "Convolution" bottom: "fire3/squeeze1x1" top: "fire3/expand1x1" convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire3/relu_expand1x1" type: "ReLU" bottom: "fire3/expand1x1" top: "fire3/expand1x1" } layer { name: "fire3/expand3x3" type: "Convolution" bottom: "fire3/squeeze1x1" top: "fire3/expand3x3" convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire3/relu_expand3x3" type: "ReLU" bottom: "fire3/expand3x3" top: "fire3/expand3x3" } layer { name: "fire3/concat" type: "Concat" bottom: "fire3/expand1x1" bottom: "fire3/expand3x3" top: "fire3/concat" } layer { name: "fire4/squeeze1x1" type: "Convolution" bottom: "fire3/concat" top: "fire4/squeeze1x1" convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire4/relu_squeeze1x1" type: "ReLU" bottom: "fire4/squeeze1x1" top: "fire4/squeeze1x1" } layer { name: "fire4/expand1x1" type: "Convolution" bottom: "fire4/squeeze1x1" top: "fire4/expand1x1" convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire4/relu_expand1x1" type: "ReLU" bottom: "fire4/expand1x1" top: "fire4/expand1x1" } layer { name: "fire4/expand3x3" type: "Convolution" bottom: "fire4/squeeze1x1" top: "fire4/expand3x3" convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire4/relu_expand3x3" type: "ReLU" bottom: "fire4/expand3x3" top: "fire4/expand3x3" } layer { name: "fire4/concat" type: "Concat" bottom: "fire4/expand1x1" bottom: "fire4/expand3x3" top: "fire4/concat" } layer { name: "pool4" type: "Pooling" bottom: "fire4/concat" top: "pool4" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fire5/squeeze1x1" type: "Convolution" bottom: "pool4" top: "fire5/squeeze1x1" convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire5/relu_squeeze1x1" type: "ReLU" bottom: "fire5/squeeze1x1" top: "fire5/squeeze1x1" } layer { name: "fire5/expand1x1" type: "Convolution" bottom: "fire5/squeeze1x1" top: "fire5/expand1x1" convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire5/relu_expand1x1" type: "ReLU" bottom: "fire5/expand1x1" top: "fire5/expand1x1" } layer { name: "fire5/expand3x3" type: "Convolution" bottom: "fire5/squeeze1x1" top: "fire5/expand3x3" convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire5/relu_expand3x3" type: "ReLU" bottom: "fire5/expand3x3" top: "fire5/expand3x3" } layer { name: "fire5/concat" type: "Concat" bottom: "fire5/expand1x1" bottom: "fire5/expand3x3" top: "fire5/concat" } layer { name: "fire6/squeeze1x1" type: "Convolution" bottom: "fire5/concat" top: "fire6/squeeze1x1" convolution_param { num_output: 48 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire6/relu_squeeze1x1" type: "ReLU" bottom: "fire6/squeeze1x1" top: "fire6/squeeze1x1" } layer { name: "fire6/expand1x1" type: "Convolution" bottom: "fire6/squeeze1x1" top: "fire6/expand1x1" convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire6/relu_expand1x1" type: "ReLU" bottom: "fire6/expand1x1" top: "fire6/expand1x1" } layer { name: "fire6/expand3x3" type: "Convolution" bottom: "fire6/squeeze1x1" top: "fire6/expand3x3" convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire6/relu_expand3x3" type: "ReLU" bottom: "fire6/expand3x3" top: "fire6/expand3x3" } layer { name: "fire6/concat" type: "Concat" bottom: "fire6/expand1x1" bottom: "fire6/expand3x3" top: "fire6/concat" } layer { name: "fire7/squeeze1x1" type: "Convolution" bottom: "fire6/concat" top: "fire7/squeeze1x1" convolution_param { num_output: 48 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire7/relu_squeeze1x1" type: "ReLU" bottom: "fire7/squeeze1x1" top: "fire7/squeeze1x1" } layer { name: "fire7/expand1x1" type: "Convolution" bottom: "fire7/squeeze1x1" top: "fire7/expand1x1" convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire7/relu_expand1x1" type: "ReLU" bottom: "fire7/expand1x1" top: "fire7/expand1x1" } layer { name: "fire7/expand3x3" type: "Convolution" bottom: "fire7/squeeze1x1" top: "fire7/expand3x3" convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire7/relu_expand3x3" type: "ReLU" bottom: "fire7/expand3x3" top: "fire7/expand3x3" } layer { name: "fire7/concat" type: "Concat" bottom: "fire7/expand1x1" bottom: "fire7/expand3x3" top: "fire7/concat" } layer { name: "fire8/squeeze1x1" type: "Convolution" bottom: "fire7/concat" top: "fire8/squeeze1x1" convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire8/relu_squeeze1x1" type: "ReLU" bottom: "fire8/squeeze1x1" top: "fire8/squeeze1x1" } layer { name: "fire8/expand1x1" type: "Convolution" bottom: "fire8/squeeze1x1" top: "fire8/expand1x1" convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire8/relu_expand1x1" type: "ReLU" bottom: "fire8/expand1x1" top: "fire8/expand1x1" } layer { name: "fire8/expand3x3" type: "Convolution" bottom: "fire8/squeeze1x1" top: "fire8/expand3x3" convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire8/relu_expand3x3" type: "ReLU" bottom: "fire8/expand3x3" top: "fire8/expand3x3" } layer { name: "fire8/concat" type: "Concat" bottom: "fire8/expand1x1" bottom: "fire8/expand3x3" top: "fire8/concat" } layer { name: "pool8" type: "Pooling" bottom: "fire8/concat" top: "pool8" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fire9/squeeze1x1" type: "Convolution" bottom: "pool8" top: "fire9/squeeze1x1" convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire9/relu_squeeze1x1" type: "ReLU" bottom: "fire9/squeeze1x1" top: "fire9/squeeze1x1" } layer { name: "fire9/expand1x1" type: "Convolution" bottom: "fire9/squeeze1x1" top: "fire9/expand1x1" convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" } } } layer { name: "fire9/relu_expand1x1" type: "ReLU" bottom: "fire9/expand1x1" top: "fire9/expand1x1" } layer { name: "fire9/expand3x3" type: "Convolution" bottom: "fire9/squeeze1x1" top: "fire9/expand3x3" convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } } } layer { name: "fire9/relu_expand3x3" type: "ReLU" bottom: "fire9/expand3x3" top: "fire9/expand3x3" } layer { name: "fire9/concat" type: "Concat" bottom: "fire9/expand1x1" bottom: "fire9/expand3x3" top: "fire9/concat" } layer { name: "drop9" type: "Dropout" bottom: "fire9/concat" top: "fire9/concat" dropout_param { dropout_ratio: 0.5 } } layer { name: "conv10" type: "Convolution" bottom: "fire9/concat" top: "conv10" convolution_param { num_output: 1000 pad: 1 kernel_size: 1 weight_filler { type: "gaussian" mean: 0.0 std: 0.01 } } } layer { name: "relu_conv10" type: "ReLU" bottom: "conv10" top: "conv10" }

========= RPN ============

layer { name: "rpn_conv/3x3" type: "Convolution" bottom: "conv10" top: "rpn/output" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "rpn_relu/3x3" type: "ReLU" bottom: "rpn/output" top: "rpn/output" }

layer { name: "rpn_cls_score" type: "Convolution" bottom: "rpn/output" top: "rpn_cls_score" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 18 # 2(bg/fg) * 9(anchors) kernel_size: 1 pad: 0 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }

layer { name: "rpn_bbox_pred" type: "Convolution" bottom: "rpn/output" top: "rpn_bbox_pred" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 36 # 4 * 9(anchors) kernel_size: 1 pad: 0 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }

layer { bottom: "rpn_cls_score" top: "rpn_cls_score_reshape" name: "rpn_cls_score_reshape" type: "Reshape" reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } } }

layer { name: 'rpn-data' type: 'Python' bottom: 'rpn_cls_score' bottom: 'gt_boxes' bottom: 'im_info' bottom: 'data' top: 'rpn_labels' top: 'rpn_bbox_targets' top: 'rpn_bbox_inside_weights' top: 'rpn_bbox_outside_weights' python_param { module: 'rpn.anchor_target_layer' layer: 'AnchorTargetLayer' param_str: "'feat_stride': 16" } }

layer { name: "rpn_loss_cls" type: "SoftmaxWithLoss" bottom: "rpn_cls_score_reshape" bottom: "rpn_labels" propagate_down: 1 propagate_down: 0 top: "rpn_cls_loss" loss_weight: 1 loss_param { ignore_label: -1 normalize: true } }

layer { name: "rpn_loss_bbox" type: "SmoothL1Loss" bottom: "rpn_bbox_pred" bottom: "rpn_bbox_targets" bottom: 'rpn_bbox_inside_weights' bottom: 'rpn_bbox_outside_weights' top: "rpn_loss_bbox" loss_weight: 1 smooth_l1_loss_param { sigma: 3.0 } }

========= RoI Proposal ============

layer { name: "rpn_cls_prob" type: "Softmax" bottom: "rpn_cls_score_reshape" top: "rpn_cls_prob" }

layer { name: 'rpn_cls_prob_reshape' type: 'Reshape' bottom: 'rpn_cls_prob' top: 'rpn_cls_prob_reshape' reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } } }

layer { name: 'proposal' type: 'Python' bottom: 'rpn_cls_prob_reshape' bottom: 'rpn_bbox_pred' bottom: 'im_info' top: 'rpn_rois'

top: 'rpn_scores'

python_param { module: 'rpn.proposal_layer' layer: 'ProposalLayer' param_str: "'feat_stride': 16" } } layer { name: 'roi-data' type: 'Python' bottom: 'rpn_rois' bottom: 'gt_boxes' top: 'rois' top: 'labels' top: 'bbox_targets' top: 'bbox_inside_weights' top: 'bbox_outside_weights' python_param { module: 'rpn.proposal_target_layer' layer: 'ProposalTargetLayer' param_str: "'num_classes': 2" } }

========= RCNN ============

layer { name: "roi_pool5" type: "ROIPooling" bottom: "conv10" bottom: "rois" top: "pool10" roi_pooling_param { pooled_w: 13 pooled_h: 13 spatial_scale: 0.0625 # 1/16 } }

layer { name: "cls_score" type: "InnerProduct" bottom: "pool10" top: "cls_score" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bbox_pred" type: "InnerProduct" bottom: "pool10" top: "bbox_pred" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 8 weight_filler { type: "gaussian" std: 0.001 } bias_filler { type: "constant" value: 0 } } } layer { name: "loss_cls" type: "SoftmaxWithLoss" bottom: "cls_score" bottom: "labels" propagate_down: 1 propagate_down: 0 top: "loss_cls" loss_weight: 1 } layer { name: "loss_bbox" type: "SmoothL1Loss" bottom: "bbox_pred" bottom: "bbox_targets" bottom: "bbox_inside_weights" bottom: "bbox_outside_weights" top: "loss_bbox" loss_weight: 1 }

huvers commented 7 years ago

Please update if you get this working. Thanks

mengzhangjian commented 7 years ago

@huvers ,I tried many possible methods of net architecture,none of them is effective, do you figure it out?

huvers commented 7 years ago

@mengzhangjian - Not yet, I'm still working on it too.

I'll put up a $100 bounty through paypal to whomever can get this working :)

karaspd commented 7 years ago

@mengzhangjian I am working on this right now, what is a result you achieve on which dataset?

mengzhangjian commented 7 years ago

@karaspd ,I use it for face detection in WIDER face dataset, in the above config, I got nothing.

karaspd commented 7 years ago

@mengzhangjian it is a little bit strange. I am training and testing on kitti dataset with 3 different object difficulties. I used the squeezenet v1.1 and I placed roi-pool layer after fire9 with pooled_w: 7 pooled_h: 7. As a result, I see some fairly good results on easy objects (81%) but not quite good on moderate and hard objects. One of the issue with this network is detecting lots of fp bounding boxes.

huvers commented 7 years ago

@karaspd Would you mind sharing your train.prototxt for SqueezeNetv1.1 ?

huvers commented 7 years ago

@karaspd Hrmmm, that wouldn't run for my case (I modified it for 5 classes). How did you come up with "rpn_cls_score" to output 120?

karaspd commented 7 years ago

if you want to consider 60 anchors you need to change scales and aspect ratio in rpn codes. However you can ignore it and just consider 9 anchors like before and test if you have any problem. change 120 to 18 and 240 to 36.

huvers commented 7 years ago

@karaspd Yes, I've tried changing them back to the usual 18 and 36 (as well as the rpn_cls_prob_reshape layer back to 18), but I'm still seeing no convergence in learning, and an error on the validation set:

File "/home/huvers/PycharmProjects/py-faster-rcnn/tools/../lib/datasets/voc_eval.py", line 148, in voc_eval BB = BB[sorted_ind, :] IndexError: too many indices for array


The same data works just fine training with VGG16.

Regardless, thanks for your help.

train.txt test.txt

karaspd commented 7 years ago

I saw this error before, are you sure you are using a right pre-trained network? squeezenetv1.1?

huvers commented 7 years ago

Yes, it is Squeezenetv1.1.

So I discovered it was actually the learning rate that was giving it issues. Originally, lr = .001, which was apparently too much. Reducing this to lr = .0001 led to convergence and the previous error disappeared. Currently running a large number of iterations to see how well it ultimately performs on my custom dataset.

karaspd commented 7 years ago

That's great, let me know about your results.

mengzhangjian commented 7 years ago

@karaspd Thank you for your contribution,I will test for my face data and tell you the result.

karaspd commented 7 years ago

Hi @mengzhangjian @huvers, how is the training going? Did you get any interesting results?

huvers commented 7 years ago

Hi @karaspd

My results haven't been very good so far. On my custom data (5 classes) VGG16 would achieve map=.80, while Squeezenet v1.1 can only do map=.55. I have to turn the confidence down very low for testing, and get a large number of false positives as a result.

I've experimented by making pooled_w,h = 13, which did not work well (map=.38), as well as adjusting the number of rpn convolution features without success.

Have you had better success?

karaspd commented 7 years ago

Hi @huvers,

I am getting a large number of false positives same as you and the average map that I got on kitti dataset is 0.64.

mengzhangjian commented 7 years ago

@karaspd ,Sorry for that.I just took holidays a few days, I test it for my face data and the detection result is effective,I will later get concrete result. Thanks

absorbguo commented 7 years ago

Recently, I'm working on the integration of squeezenet and faster-rcnn. After checking your model @mengzhangjian ,I find out that after the roi-pooling layer ,you didn't add extra fully connected layer, would this decrease the detection performance?

durveshpathak commented 6 years ago

Hi @absorbguo @mengzhangjian were you able finally able to integrate SqueezeNet with faster RCNN ? is there a place where I can look at prototxt files ?