pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.82k stars 21.33k forks source link

Training YOLO on COCO terminates during training (Colab) #1957

Open MichaelTUD opened 4 years ago

MichaelTUD commented 4 years ago

Hi,

I followed this tutorial (https://pjreddie.com/darknet/yolo/) to setup Darknet and YOLO to train with COCO dataset. I installed it on Colab and prediction on pretrained models is working.

But for me "Training YOLO on COCO" does not train properly.

Executing this ./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74 leads to a termination. During printing all layers the process stops with a ^C. image image

SULTANGOLD commented 4 years ago

I have same problem, but with custom dataset!

albangabillon commented 4 years ago

same problem with custom dataset with this ^C

nancyshaji commented 4 years ago

Even I have the same problem.. It stops after around 170 iterations... So i end up retracting back to where the saved weights are...

albangabillon commented 4 years ago

Hi Nancy, I sorted it out. It is a memory problem. Reduce the batch size. It should be fine.

nancyshaji commented 4 years ago

@albangabillon Can I change the batch size in between the training?

fengzisheng commented 4 years ago

@albangabillon Can I change the batch size in between the training?

Maybe you can't. Just change it in the cfg file. I had the same problem and fixed it by changing the batch size from 64 to 32.