Open eticin opened 4 years ago
What's your training command?
My training command for 2 gpu and D7 model is python train.py -c 7 -p y3_run6 -n 16 --batch_size 2 --lr 1e-6 \ --load_weights /mnt/trains/users/botan/Yet-Another-EfficientDet-Pytorch/weights/pre/efficientdet-d7.pth \ --num_epochs 200 \ Training command for D7x is similar.
Same here. Have problem with d8.
python train.py -c 8 -p wtm -n 12 --batch_size 32 --lr 1e-5 --num_epochs 200 \ --load_weights /data/wtm/efficient-det/weights/efficientdet-d8.pth\ --data_path /data/wtm/efficient-det --optim adamw
Tesla V100-SXM2 * 4 Nvidia - 410.104 CUDA - 10.2
EDIT: Retried with 7,6,5 - same issue
What about -n 0 --batch_size {num_gpu}
?
@yldrmBtn @Art200696 have you found a solution ? For me the explanation does not directly come from the size of the network but the size of the input image. While I tried to debug the memory consumption, I've pointed out that the backbone was responsible of 80% of it. I think EfficientNet does not reduce fast enough the spatial dimensions, leading to a large number of channels with large dimensions.
@rvandeghen actually pytorch is also to blame beacause the cache of every op stays forever. If you add torch.cuda.empty_cache()
right after some certain op, the memory usage will soon drops.
The solution is not to use pytorch, lol. This issue will not happen on static graph framework like tf 1.x.
@zylo117 I have seen you mentioned this trick in another issue but pytorch does not recommend to use it. I have discovered a small issue in the repo but it might be intentional. How can I reach you to discuss it without a PR ?
Actually, I did not think about the problem too much. I tried some other small models instead of solving problem
@rvandeghen Yes, this trick is not nice or elegant at all and I've never used it. I mentioned it just to show what that memory is used for. And we can discuss anything about the repo here.
@rvandeghen Yes, this trick is not nice or elegant at all and I've never used it. I mentioned it just to show what that memory is used for. And we can discuss anything about the repo here.
I also met the same problem, CUDA OOM while training the d7x. I use the 32GB V100. The command is python train.py -c 8 -p nuimages --batch_size 8 --lr 0.0005 --num_epochs 24 --data_path /cache/ --load_weights ./pretrained_models/efficientdet-d8.pth
@fangyixiao18 I think a batch size of 8 with effdet7x is too big even for one V100. I suggest you to debug the memory consumption starting from a BS=1 and to try increasing it while you can.
@fangyixiao18 I think a batch size of 8 with effdet7x is too big even for one V100. I suggest you to debug the memory consumption starting from a BS=1 and to try increasing it while you can.
Actually, I use 8 V100 GPUs. Each gpu has BS=1.
I am using 4 Tesla V100 with batch size of 4 for training D7x model but I got CUDA out of memory error. Also, I got memory error for D7 model. The possible largest model for training with single image in 32 Gb GPU is D6. I think there should be a problem. Is this memory usage normal ? Please, can you share information related to gpu usage during training? Note: I am trying to train model completely, not head only.