ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
671 stars 290 forks source link

out of memory #102

Closed ShaneYS closed 6 years ago

ShaneYS commented 6 years ago

Thanks to your great job and now I can start to train mx-rcnn on OpenImages dataset. But there is still a problem. When I finetune the resnet101 with rcnn-batch-size>1, there will be an error : cudaMalloc failed: out of memory. Then I use rcnn-batch-size=1, training can go smoothly, but the problem (out of memory) still occurs after thousands of batches. I think I did not modify the batch size correctly. Can you tell me how to solve this problem? Thank you very much. My GPU is TiTAN XP x 4.

ijkguo commented 6 years ago

Not so sure why but how about trying out resnet50 as base network?

ShaneYS commented 6 years ago

@ijkguo Thanks you. Maybe someone else takes up the gpu. I am using resnet101 and now have trained for 100000 batches without any problem. Another question. Can I ues rcnn-batch-size>1 to train? When I try to use rcnn-batch-size>1, there occurs the out of memory error.

ijkguo commented 6 years ago

It should work. Alert: batch size 2 -> memory consumption 2.