PyTorch_YOLOv4 - yolov4.cfg and yolov4.weights at 640 image size, batch size 8, overload GPU RAM

I didn't expect this one, but the following setting overload the 11 GB GPUs;

python -m torch.distributed.launch --nproc_per_node 8 train.py \
--sync-bn \
--cfg ~/PAI/detectors/PyTorch_YOLOv4/cfg/yolov4.cfg \
--weights ~/PAI/detectors/PyTorch_YOLOv4/weights/yolov4.weights \
--data ~/PAI/scripts/config_yolov5.yaml \
--hyp ~/PAI/scripts/yolo_custom_hyp.yaml \
--epochs 300 \
--batch-size 64 \
--img-size 640 640 \
--workers 3 \
--name yolov4_b8_e300_img640_hyp_custom

RuntimeError: CUDA out of memory. 
Tried to allocate 100.00 MiB (GPU 0; 10.76 GiB total capacity; 9.72 GiB already allocated; 32.56 MiB free; 9.78 GiB reserved in total by PyTorch)

If we find the tiny weights as mentioned here https://github.com/stark-t/PAI/issues/44 then I can give that one a try. For now, I will move to the V100 GPUs and see if the same happens.

Could it be that YOLOv4 does some cashing in the GPU RAM, or is the architecture simply more demanding than YOLOv5 nano and small? I checked and I do not use the option --cache-images.

stark-t / PAI

PyTorch_YOLOv4 - yolov4.cfg and yolov4.weights at 640 image size, batch size 8, overload GPU RAM #45