tianzhi0549 / FCOS

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
https://arxiv.org/abs/1904.01355
Other
3.28k stars 630 forks source link

out of memory #70

Closed heiyuxiaokai closed 5 years ago

heiyuxiaokai commented 5 years ago

File "/home/fw/Softwares/FCOS/maskrcnn_benchmark/structures/boxlist_ops.py", line 84, in boxlist_iou wh = (rb - lt + TO_REMOVE).clamp(min=0) # [N,M,2] RuntimeError: CUDA out of memory. Tried to allocate 1.56 GiB (GPU 1; 11.92 GiB total capacity; 7.99 GiB already allocated; 1.20 GiB free; 1.74 GiB cached)

It seems the iou caculate' problem. I use retinanet, batch 4, 2 titan x(12G) The GPU use of beginning: Screenshot from 2019-06-23 10-40-26 Should I set the batch to 2?

tianzhi0549 commented 5 years ago

@heiyuxiaokai Did FCOS run out of memory?

heiyuxiaokai commented 5 years ago

@tianzhi0549 No,Maybe the iou caculate process of a special image(with many boxes) need a lot of memory. FCOS haven't this process. Did your GPU is 12G where you train this model for (4 gpu, batch 8)? The data I use is remote sensing image, which may have many object.

tianzhi0549 commented 5 years ago

@heiyuxiaokai our GPUs are 32GB V100.

heiyuxiaokai commented 5 years ago

@tianzhi0549 So I should set batch to 2. You train batch 8 of 4 GPU(V100). Why don't you use a larger batch for 32g GPU?

tianzhi0549 commented 5 years ago

@heiyuxiaokai We use 16 images in a mini-batch for a fair comparison.

heiyuxiaokai commented 5 years ago

Too many GT Boxes. It was explained there. https://github.com/facebookresearch/maskrcnn-benchmark/issues/18

dreamhighchina commented 5 years ago

你的解决了吗?我也是在计算loss的时候出错了,我的batchsize是2都错。

heiyuxiaokai commented 5 years ago

@dreamhighchina Reference there: https://github.com/facebookresearch/maskrcnn-benchmark/issues/884