Open zfchenUnique opened 7 years ago
It supports batch_size > 1. You can comment the if statement in roi_pooling_cuda.c and rebuild it.
@longcw @JeffCHEN2017 does this project support batch_size larger than 1?
Yes, it supports as long as you comment the codes in row_pooling_cuda.c. Please read the code for details. And I have been using it for a while and so far so good.
@JeffCHEN2017 , @longcw, can you please elaborate how you managed to train with multiple batch size? As you guys suggested, I re-ran roi_pooling_cuda.c by commenting out the relevant lines. Then, I changed IMS_PER_BATCH: 4 in experiments/cfgs/faster_rcnn_end2end.yml
. When I start to train I got the following:
File "train.py", line 115, in <module>
blobs = data_layer.forward()
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/roi_data_layer/layer.py", line 74, in forward
blobs = self._get_next_minibatch()
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/roi_data_layer/layer.py", line 70, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/roi_data_layer/minibatch.py", line 39, in get_minibatch
assert len(im_scales) == 1, "Single batch only"
AssertionError: Single batch only
Then, I commented out relevant assert lines, and finally got another error:
File "train.py", line 123, in <module>
net(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
File "/home/sam/.virtualenvs/cv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/faster_rcnn.py", line 215, in forward
features, rois = self.rpn(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
File "/home/sam/.virtualenvs/cv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/faster_rcnn.py", line 71, in forward
cfg_key, self._feat_stride, self.anchor_scales)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/faster_rcnn.py", line 122, in proposal_layer
x = proposal_layer_py(rpn_cls_prob_reshape, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchor_scales)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/rpn_msr/proposal_layer.py", line 131, in proposal_layer
proposals = bbox_transform_inv(anchors, bbox_deltas)
File "/home/sam/Projects/detection/frcnn.pytorch/faster_rcnn/fast_rcnn/bbox_transform.py", line 59, in bbox_transform_inv
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
ValueError: operands could not be broadcast together with shapes (74592,1) (18648,1)
Seems like it can't load boxes? Have you experiences anything likes this? If so, how did you manage to solve it?
FWIW, there is also a subtle bug in the cuda backwards code for roi pooling that manifests itself only when batch size is > 1:
This line https://github.com/longcw/faster_rcnn_pytorch/blob/4fda7a4b89cf71fc3905bd484b1dc82dbc6150d1/faster_rcnn/roi_pooling/src/cuda/roi_pooling_kernel.cu#L170 should end with == (c * height + h) * width + w
instead of == index
, or else gradients will be propagated only into the first element of the batch. Discovered this issue while using roi pooling layer for another project.
From the source code(roi_pooling_cuda.c) and my naive experiments, it seems that the RoI pooling layer only support batch size equals to one. Does anyone know why?