jwyang / fpn.pytorch

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection
MIT License
952 stars 221 forks source link

Training suddenly terminates with run time error, please help #13

Open Karthik-Suresh93 opened 6 years ago

Karthik-Suresh93 commented 6 years ago

[session 1][epoch 1][iter 2100] loss: 1.2515, lr: 1.00e-03 fg/bg=(32/96), time cost: 46.959169 rpn_cls: 0.0647, rpn_box: 0.0156, rcnn_cls: 0.7545, rcnn_box 0.4920 [session 1][epoch 1][iter 2200] loss: 1.3776, lr: 1.00e-03 fg/bg=(32/96), time cost: 46.760157 rpn_cls: 0.2410, rpn_box: 0.1341, rcnn_cls: 0.7460, rcnn_box 0.3669 Traceback (most recent call last): File "trainval_net.py", line 330, in roi_labels = FPN(im_data, im_info, gt_boxes, num_boxes) File "/home/k21993/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/home/k21993/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/k21993/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/k21993/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output RuntimeError: invalid argument 3: expecting vector of indices at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCTensorIndex.cu:4

KevinQian97 commented 6 years ago

I have encountered with the same problem. Have you found out how to solve it?

KevinQian97 commented 6 years ago

I am not sure if it's because of the version of pytorch

KevinQian97 commented 6 years ago

I found that the code runs normally on faster-rcnn. But if I use the code of fpn, it failed. So I guess the problem happens in fpn.py, but I still can't find out why. What's more, I used this model to train my personal data, if I changed the data back to origin Voc2007, it works. That's strange. I just changed my personal data into the form of Voc2007. Here is one of my annotation file:

train VIRAT_S_000000.mp4_0 C:/Users/Kevin Qian/Downloads/images/train/VIRAT_S_000000.mp4_0.jpg Unknown 1920 1080 3 0 Other 0 636 723 655 787 Other 0 411 618 438 703 Person 0 349 709 410 850 Other 0 760 758 778 831 Person 0 1386 245 1432 354 Person 0 276 688 345 845 Other 0 512 687 541 747

and here is the annotation file in original voc2007

VOC2007 009962.jpg The VOC2007 Database PASCAL VOC2007 flickr 246788553 Tool - Wroclaw Milosz J. 500 375 3 0 chair Right 1 0 211 192 324 326 person Unspecified 1 0 162 72 273 248 person Right 1 0 250 68 473 312 person Right 1 0 4 2 253 374 diningtable Unspecified 1 1 358 216 500 375