stark-t / PAI

Pollination_Artificial_Intelligence
5 stars 1 forks source link

PyTorch_YOLOv4 - yolov4.cfg and yolov4.weights at 640 image size, batch size 8, overload GPU RAM #45

Closed valentinitnelav closed 2 years ago

valentinitnelav commented 2 years ago

I didn't expect this one, but the following setting overload the 11 GB GPUs;

python -m torch.distributed.launch --nproc_per_node 8 train.py \
--sync-bn \
--cfg ~/PAI/detectors/PyTorch_YOLOv4/cfg/yolov4.cfg \
--weights ~/PAI/detectors/PyTorch_YOLOv4/weights/yolov4.weights \
--data ~/PAI/scripts/config_yolov5.yaml \
--hyp ~/PAI/scripts/yolo_custom_hyp.yaml \
--epochs 300 \
--batch-size 64 \
--img-size 640 640 \
--workers 3 \
--name yolov4_b8_e300_img640_hyp_custom
RuntimeError: CUDA out of memory. 
Tried to allocate 100.00 MiB (GPU 0; 10.76 GiB total capacity; 9.72 GiB already allocated; 32.56 MiB free; 9.78 GiB reserved in total by PyTorch)

If we find the tiny weights as mentioned here https://github.com/stark-t/PAI/issues/44 then I can give that one a try. For now, I will move to the V100 GPUs and see if the same happens.

Could it be that YOLOv4 does some cashing in the GPU RAM, or is the architecture simply more demanding than YOLOv5 nano and small? I checked and I do not use the option --cache-images.

valentinitnelav commented 2 years ago

I will now test the YOLOv4 pacsp-s weights. I presume those are lighter weights.

This seems to work for now. A job is running without OMM issues so far using YOLOv4 pacsp-s weights.

See commit 2b9cf6aff2299b5fa177150f37104120c1f400fd