Closed Ringhu closed 4 years ago
@Ringhu you need accumulate 1.
python3 train.py --cfg cfg/yolov3-tiny.cfg --batch-size 64 --accumulate 1 --weights ''
@glenn-jocher Thanks for reply. I will try it sooner.
@Ringhu BTW I would also git pull
a current version of the repo as it changes often, and rather than look at your results.txt there should be a results.png file created after training finishes.
@glenn-jocher the .png file is attached. And could you please explain why adjusting the accumulate to 1 helps?
@Ringhu this is very strange behavior on the last couple epochs. I've never seen the validation losses drop like that. Something may be wrong with your training. Are you using an un-modified git clone?
About the --accumulate, you always need to use --batch-size --accumulate multiplied together to calculate the total batch size. So your first command was --batch-size 64 --accumulate 4 (4 is argparse default), which leads to a total batch-size of 256, which is much larger than recommended. We recommend total batch size of 64, using --batch 64 --accum 1, or --batch 32 --accum 2 for example.
Also use multi scale.
Basically use everything mentioned in https://github.com/ultralytics/yolov3/issues/310
Hi @glenn-jocher ,Here is my update command:
python3 se_train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --evolve --cache-images --name=baseline --device=4,5,6,7 --adam
I change the 'evolve' part code to test after every epoch. However, there r some problems and I don't know if it's normal.
1.The speed is really slow, it takes about half an hour for training an epoch now;
2.The mAP doesn't increase. By now I trained 20 epoch but the mAP keep around 0.02 and doesn't increase.
The log is right here.
Looking forward to your advice.
This command reproduces our mAP results when training yolov3-spp.cfg
from scratch. See https://github.com/ultralytics/yolov3#reproduce-our-results
$ python3 train.py --weights '' --cfg yolov3-spp.cfg --epochs 273 --batch 16 --accum 4 --multi --pre
I suggest you simply clone the default repo without changes and train using the above command, swapping your cfg in of course. --cache
is a good idea for smaller datasets as well.
Hi @glenn-jocher ,I git pull a current version of the repo yesterday and trained yolov3 24epoch for test.However, the result still seemed not good,
.
My command is here:
python3 train.py --batch-size=32 --accumulate=2 --cfg=cfg/yolov3.cfg --multi-scale --device=6,7 --name=baseline --adam --weights= --prebias
I don't know if it's normal or something wrong with my training.
@Ringhu don't use --adam
Hi @glenn-jocher .It's just a update of my training. Finally I trained the tiny-yolo to the mAP@0.5 33.1 with this command:
python3 train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --prebias
the results are here:
Thank you for all your advice!
@Ringhu yeah that all looks correct! Good work :)
Thanks for the contribution first! I have a question here. I'm doing some research of tiny-yolo so I need to reproduce the result, which is the mAP, of tiny-yolo. In the README.md you mention the COCOmAP@0.5 of size 416 is 33.0, while I only get 30.7 when I trained the tiny-yolo from scratch. My trainning command is:
python3 train.py --cfg=cfg/yolov3-tiny.cfg --batch-size=64 --device=1,2 --weights=
And I do the training with 2 RTX 2080Ti GPU.The results is attachedresults.txt.
Is there any suggestion for my training to increase the mAP to 33.0?