RCNN_roi_align ERROR when training

xiaomengyc commented 6 years ago

when I train the FPN network on my own dataset for several steps, it goes into the following error. Traceback (most recent call last): File "trainval_net.py", line 335, in <module> roi_labels = FPN(im_data, im_info, gt_boxes, num_boxes) File "/home/xiaolin/xlzhang/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/xiaolin/xlzhang/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/xiaolin/xlzhang/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/data/xlzhang/fpn.pytorch/lib/model/fpn/fpn.py", line 236, in forward roi_pool_feat = self._PyramidRoI_Feat(mrcnn_feature_maps, rois, im_info) File "/data/xlzhang/fpn.pytorch/lib/model/fpn/fpn.py", line 134, in _PyramidRoI_Feat feat = self.RCNN_roi_align(feat_maps[i], rois[idx_l], scale) File "/home/xiaolin/xlzhang/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/data/xlzhang/fpn.pytorch/lib/model/roi_align/modules/roi_align.py", line 28, in forward scale)(features, rois) File "/data/xlzhang/fpn.pytorch/lib/model/roi_align/functions/roi_align.py", line 27, in forward rois, output) File "/home/xiaolin/xlzhang/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 197, in safe_call result = torch._C._safe_call(*args, **kwargs) torch.FatalError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generic/THCTensor.c:23 Can anyone help me with this?

Thank you!

winterest commented 6 years ago

Same error.

AshStuff commented 6 years ago

That is because you have only one roi_level in idx_l , you can print idx_l and see. The idx_l should be a list but when there is only 1 roi_level for a given l then you will get this error. you can change the line https://github.com/jwyang/fpn.pytorch/blob/master/lib/model/fpn/fpn.py#L131 to

               idx_l = (roi_level == l).nonzero()
                if idx_l.shape[0] > 1:
                    idx_l = idx_l.squeeze()
                else:
                    idx_l = idx_l.view(-1)

KevinQian97 commented 6 years ago

@AshStuff

Hello， I used your method. But I am encountered with a new problem:

Traceback (most recent call last): File "trainval_net.py", line 366, in <module> roi_labels = FPN(im_data, im_info, gt_boxes, num_boxes) File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply raise output RuntimeError: invalid argument 2: Input tensor must have same size as output tensor apart from the specified dimension at /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCTensorScatterGather.cu:29

I think it similar to the previous problem, would you mind helping me solve it Thank you so much

planckztd commented 6 years ago

I meet the same problem, and solved now. thanks @AshStuff

Karthik-Suresh93 commented 5 years ago

Hi @KevinQian97, did you solve this issue?

fwanglg commented 4 years ago

I have the same problem, I want to know how to solve it ，thanks@xiaomengyc

xiaomengyc commented 4 years ago

I have the same problem, I want to know how to solve it ，thanks@xiaomengyc

Have you tried the solution provided by @AshStuff ? It has been a while since the last time I ran this code. As I recall, downgrading PyTorch version would work. You can try it in an environment with Python2.7 and PyTorch0.4.

Good Luck!

jwyang / fpn.pytorch

RCNN_roi_align ERROR when training #21