Open AlexLuya opened 4 years ago
Same. 2080TI (11GB) with batch_size = 1 still not work. Here's the traceback:
Traceback (most recent call last):
File "train.py", line 195, in <module>
train()
File "train.py", line 140, in train
classification, regression, anchors = model(images)
File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ray/EfficientDetPytorch/models/efficientdet.py", line 62, in forward
anchors = self.anchors(inputs)
File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ray/EfficientDetPytorch/models/module.py", line 153, in forward
return torch.from_numpy(all_anchors.astype(np.float32)).cuda()
RuntimeError: CUDA error: out of memory
You can try NVIDIA apex with opt_level = 'O2
, I got 8100M GPU memory usage with batch size 16, you can try to use smaller batch size to fit in 6GB GPU RAM.
Same problem. Two 2080TI (11GB2) with batch_size = 6 . Here's the traceback:
`Traceback (most recent call last):
File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 196, in
@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.
@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.
Thanks! I have updated the code, but still the same problem. Very strange.
@toandaominh1997 I used Windows10
but, i want to use d0-d7, just one 2080Ti, and batch_size >=4 for any backbone, and input shape >=(448,448) or (640, 640) it's seems that, the basic backbone limit the input shape, and need more cuda memory, not like the paper said....more light, more efficient.
@AlexLuya @RayOnFire @shengyuqing I used: OS: Ubuntu 18.04 GPU: 2*2080TI(11GB) When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda). At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.
I don't understand the explicit way?
@toandaominh1997 I used Windows10
have U solve the problem?
have you solved the out of memory ?
I got the same problem on my Titan rtx
Your default batch size is 32,What GPU did you used for training?