ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.21k stars 3.45k forks source link

Any suggestion for training tiny-yolo from scratch? #696

Closed Ringhu closed 4 years ago

Ringhu commented 4 years ago

Thanks for the contribution first! I have a question here. I'm doing some research of tiny-yolo so I need to reproduce the result, which is the mAP, of tiny-yolo. In the README.md you mention the COCOmAP@0.5 of size 416 is 33.0, while I only get 30.7 when I trained the tiny-yolo from scratch. My trainning command is: python3 train.py --cfg=cfg/yolov3-tiny.cfg --batch-size=64 --device=1,2 --weights= And I do the training with 2 RTX 2080Ti GPU.

The results is attachedresults.txt.

Is there any suggestion for my training to increase the mAP to 33.0?

glenn-jocher commented 4 years ago

@Ringhu you need accumulate 1.

python3 train.py --cfg cfg/yolov3-tiny.cfg --batch-size 64 --accumulate 1 --weights ''
Ringhu commented 4 years ago

@glenn-jocher Thanks for reply. I will try it sooner.

glenn-jocher commented 4 years ago

@Ringhu BTW I would also git pull a current version of the repo as it changes often, and rather than look at your results.txt there should be a results.png file created after training finishes.

Ringhu commented 4 years ago

@glenn-jocher the .png file is attached. results And could you please explain why adjusting the accumulate to 1 helps?

glenn-jocher commented 4 years ago

@Ringhu this is very strange behavior on the last couple epochs. I've never seen the validation losses drop like that. Something may be wrong with your training. Are you using an un-modified git clone?

About the --accumulate, you always need to use --batch-size --accumulate multiplied together to calculate the total batch size. So your first command was --batch-size 64 --accumulate 4 (4 is argparse default), which leads to a total batch-size of 256, which is much larger than recommended. We recommend total batch size of 64, using --batch 64 --accum 1, or --batch 32 --accum 2 for example.

glenn-jocher commented 4 years ago

Also use multi scale.

Basically use everything mentioned in https://github.com/ultralytics/yolov3/issues/310

Ringhu commented 4 years ago

Hi @glenn-jocher ,Here is my update command: python3 se_train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --evolve --cache-images --name=baseline --device=4,5,6,7 --adam I change the 'evolve' part code to test after every epoch. However, there r some problems and I don't know if it's normal. 1.The speed is really slow, it takes about half an hour for training an epoch now; 2.The mAP doesn't increase. By now I trained 20 epoch but the mAP keep around 0.02 and doesn't increase. The log is right here. image Looking forward to your advice.

glenn-jocher commented 4 years ago

This command reproduces our mAP results when training yolov3-spp.cfg from scratch. See https://github.com/ultralytics/yolov3#reproduce-our-results

$ python3 train.py --weights '' --cfg yolov3-spp.cfg --epochs 273 --batch 16 --accum 4 --multi --pre

results

I suggest you simply clone the default repo without changes and train using the above command, swapping your cfg in of course. --cache is a good idea for smaller datasets as well.

Ringhu commented 4 years ago

Hi @glenn-jocher ,I git pull a current version of the repo yesterday and trained yolov3 24epoch for test.However, the result still seemed not good, 20191213095314. My command is here: python3 train.py --batch-size=32 --accumulate=2 --cfg=cfg/yolov3.cfg --multi-scale --device=6,7 --name=baseline --adam --weights= --prebias I don't know if it's normal or something wrong with my training.

glenn-jocher commented 4 years ago

@Ringhu don't use --adam

Ringhu commented 4 years ago

Hi @glenn-jocher .It's just a update of my training. Finally I trained the tiny-yolo to the mAP@0.5 33.1 with this command: python3 train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --prebias the results are here: AP baseline_results Thank you for all your advice!

glenn-jocher commented 4 years ago

@Ringhu yeah that all looks correct! Good work :)