Open KevinQian97 opened 6 years ago
I found that the code runs normally on faster-rcnn. But if I use the code of fpn, it failed. So I guess the problem happens in fpn.py, but I still can't find out why. What's more, I used this model to train my personal data, if I changed the data back to origin Voc2007, it works. That's strange. I just changed my personal data into the form of Voc2007. Here is one of my annotation file:
and here is the annotation file in original voc2007
@KevinQian97 I have encountered with the same problem. Have you found out how to solve it?
@KevinQian97 @WangTianYuan did you solve this issue?
Have you solved the problem? I got the same error.@KevinQian97 @WangTianYuan
Have you solved the problem? I got the same error.@KevinQian97 @WangTianYuan
I found that if you use your own dataset to train the model, if it has dirty data, it will cause Nan values in roi level in FPN.py. You can try the following modification methods: roi level[roi level < 2] = 2 roi level[roi level > 5] = 5 To roi level[roi level < 2] = 2 roi level[roi level > 5] = 5 roi level[roi level!=roi level]=5
Here are my Trace backs: [session 1][epoch 1][iter 0] loss: 4.0006, lr: 1.00e-02 fg/bg=(128/384), time cost: 7.218862 rpn_cls: 0.6919, rpn_box: 0.1386, rcnn_cls: 2.8319, rcnn_box 0.3382 Traceback (most recent call last): File "trainval_net.py", line 330, in
roi_labels = FPN(im_data, im_info, gt_boxes, num_boxes)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
RuntimeError: invalid argument 2: Input tensor must have same size as output tensor apart from the specified dimension at /opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCTensorScatterGather.cu:29