longcw / faster_rcnn_pytorch

Faster RCNN with PyTorch
MIT License
1.7k stars 466 forks source link

out of memory #79

Open manyuyuya opened 5 years ago

manyuyuya commented 5 years ago

Hello! When I run the train.py, I met the problem about out of memory after a few epoches. It also happened even if I add the number of GPU. And I found some other people met this question ,too. I don't it's reason. Could you offer some help?Thank you very much! It's the information about the question below:

step 120, image: 005365.jpg, loss: 6.3531, fps: 3.71 (0.27s per batch) TP: 0.00%, TF: 100.00%, fg/bg=(14/285) rpn_cls: 0.6417, rpn_box: 0.0229, rcnn_cls: 1.9303, rcnn_box: 0.1354 step 130, image: 009091.jpg, loss: 4.8151, fps: 3.78 (0.26s per batch) TP: 0.00%, TF: 100.00%, fg/bg=(22/277) rpn_cls: 0.6486, rpn_box: 0.2012, rcnn_cls: 1.7988, rcnn_box: 0.1184 step 140, image: 008690.jpg, loss: 4.9961, fps: 3.55 (0.28s per batch) TP: 0.00%, TF: 100.00%, fg/bg=(30/269) rpn_cls: 0.6114, rpn_box: 0.0690, rcnn_cls: 1.4801, rcnn_box: 0.1088 THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train.py", line 138, in loss.backward() File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 89, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

jinsnowy commented 5 years ago

try pytorch version 0.3.1 with cudatoolkit 8.0 I used 0.4.1 version either, but had same error (may be gpu memory leak in code). So I downgraded the version of pytorch.

machanic commented 5 years ago

I think the memory leak due to RoI pooling layer, because when I copy the code of RoI pooling layer to my another project. It also memory leak on GPU.