open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.54k stars 9.46k forks source link

After training several epochs, CUDA out of memory #4450

Closed Hanswufe closed 3 years ago

Hanswufe commented 3 years ago

I train cascade RCNN on custom dataset and the max number of gt bboxes in each image is 45. At the first time, out of memory is reported at the second epoch. At the second time, out of memory is reported at the fifth epoch.

Reproduction

  1. What command or script did you run? ./tools/train.sh

  2. samples_per_gpu = 4,num_workers=2, 2 gpus

Environment

  1. pytorch 1.3+cuda10
  2. TITAN XP 12G image
xvjiarui commented 3 years ago

Hi @Hanswufe You may try to use 2 images per gpu. It may due to the bounding box number is increasing. You may also decrease proposal number in rpn_proposal.

ZwwWayne commented 3 years ago

There are some way that may help https://mmdetection.readthedocs.io/en/latest/faq.html#training