6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory

AlexLuya commented 4 years ago

Your default batch size is 32,What GPU did you used for training?

RayOnFire commented 4 years ago

Same. 2080TI (11GB) with batch_size = 1 still not work. Here's the traceback:

Traceback (most recent call last):
  File "train.py", line 195, in <module>
    train()
  File "train.py", line 140, in train
    classification, regression, anchors = model(images)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/efficientdet.py", line 62, in forward
    anchors = self.anchors(inputs)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/module.py", line 153, in forward
    return torch.from_numpy(all_anchors.astype(np.float32)).cuda()
RuntimeError: CUDA error: out of memory

RayOnFire commented 4 years ago

You can try NVIDIA apex with opt_level = 'O2, I got 8100M GPU memory usage with batch size 16, you can try to use smaller batch size to fit in 6GB GPU RAM.

shengyuqing commented 4 years ago

Same problem. Two 2080TI (11GB2) with batch_size = 6 . Here's the traceback: `Traceback (most recent call last): File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 196, in train() File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 141, in train classification, regression, anchors = model(images) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch_utils.py", line 385, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, *kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\efficientdet.py", line 59, in forward features = self.BIFPN(features[-5:]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, *kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 109, in forward laterals = bifpn_module(laterals) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 196, in forward pathtd[i], scale_factor=2, mode='nearest') RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.00 GiB total capacity; 5.97 GiB already allocated; 678.40 KiB free; 38.08 MiB cached)`

toandaominh1997 commented 4 years ago

@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

shengyuqing commented 4 years ago

@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

Thanks! I have updated the code, but still the same problem. Very strange.

shengyuqing commented 4 years ago

@toandaominh1997 I used Windows10

foocker commented 4 years ago

but, i want to use d0-d7, just one 2080Ti, and batch_size >=4 for any backbone, and input shape >=(448,448) or (640, 640) it's seems that, the basic backbone limit the input shape, and need more cuda memory, not like the paper said....more light, more efficient.

qtw1998 commented 4 years ago

@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

I don't understand the explicit way?

qtw1998 commented 4 years ago

@toandaominh1997 I used Windows10

have U solve the problem?

Jasper-Bai commented 4 years ago

have you solved the out of memory ?

yaoliUoA commented 4 years ago

I got the same problem on my Titan rtx

toandaominh1997 / EfficientDet.Pytorch

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32