Closed nerminsamet closed 5 years ago
@nerminsamet that's a good question! The mAPs reported at https://github.com/ultralytics/yolov3#map are using the darknet trained yolov3-spp.weights
file. You can reproduce these with the code at the link.
If you train using this repo you get pytorch weights in a *.pt format. The training results are constantly improving, with the latest results coming within 1% of darknet trained results. See https://github.com/ultralytics/yolov3/issues/310 for a more detailed discussion.
In relation to your exact question regarding the relationship of multiscale with final mAP, the correlation is debatable. Darknet training uses it by default, but I have not observed any improvement when testing at the same resolution as your training img_size
. It may help, but I can't say I've observed anything consistent with that being the case. Testing at other resolutions it should obviously help, so I believe it depends on your final intended use of the trained model.
@nerminsamet of course, if you have the resources, I would simply train both ways to compare. If you do this let us know your results!
@glenn-jocher I am training this repo with the following configuration. Right now it is in the 168th epoch. I share my lastest mAP result for coco5kval at 167th epoch below. Once my training is done I will also share the final mAP.
Namespace(accumulate=1, batch_size=64, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, nosave=False, notest=False, rect=False, resume=True, transfer=False)
Result of 167th epoch:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.244
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.454
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.236
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.101
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.229
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.363
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521
@nerminsamet ah, excellent, you are committed in your training. I'll share a few tricks briefly. We typically develop based on small dataset results, like coco_16img.data, which allows rapid prototyping since training only takes a few minutes. This is useful for sanity checks and rough ideas, but results here do not correlate fully with results on the full coco dataset, so once we develop an idea on a small dataset we test it on 1 or 2 full coco epochs to tweak further, and then we test to 10% full training for a full statistical comparison. We do all this at 320 for speed, and with yolov3-spp since it bumps mAP by 1% at almost no compute expense. These are now reaching 45% mAP after 27 epochs at 320 (no multi-scale). You can see some studies we've done at https://github.com/ultralytics/yolov3/issues/441#issuecomment-520229791
The 10% training command we use is:
python3 train.py --weights weights/darknet53.conv.74 --img-size 320 --epochs 27 --batch-size 64 --accumulate 1
We've also had a weight_decay bug in place which we just fixed today, which seems to greatly impact performance: https://github.com/ultralytics/yolov3/issues/469
hi @glenn-jocher, thanks for the tricks. I first tested the code on coco_16img.data and everything was ok. Now my training is over, I got 52.2 mAP which is 3.1 behind original results. I think multi-scale training could be one reason. I will train with multi-scale setup also. I will inform you about the new results.
Final results.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.142
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.275
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.450
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.251
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601
@nerminsamet hmm ok! Maybe it is due to multiscale. Can you post your training results? If you have a recent version of the repo a results.png file will appear after training that plots results.txt. If no the plotting command is from utils.utils import *; plot_results()
, this will plot any results*.txt files it finds.
Starting new training overwrites any existing results.txt files, so I usually rename a results.txt file after training to something new like results_416.txt so it doesn't get overwritten.
@glenn-jocher here is the results!
@nerminsamet wow, ok this shows severe overtraining on the LR drop at 0.8 * 273 epochs.
Classification and Confidence may possibly be to overtraining in relation to their positive weight hyperparameters, which are about 1.5 and 4.0.
I've tried removing these recently (setting them to 1.0 and passing their values to the respective loss gains to make obj and cls gains at 40 and 80 with pw's at 1 and 1), but initial mAPs do not respond as well as the default... so perhaps initial results are worse, but long term may show net positive, I don't know.
@nerminsamet I think we can use the results below (320 with multiscale) as a proxy for what you might expect. It looks like multiscale reduces the overfitting a bit, but I believe the real culprit are the positive weights in the two BCELosses. Depending on how dirty you want to get your hands there are few changes you could try to the loss function, and there is also the hyperparameter evolution path you could explore: https://github.com/ultralytics/yolov3/issues/392.
It feels like the training is a few steps away from darknet level mAP, but unfortunately we just don't have the resources to explore all the alternatives we'd like to currently.
https://github.com/ultralytics/yolov3/issues/310#issuecomment-521115339 320 multiscale
@nerminsamet hi there! Did you get different results with --multi-scale
?
@nerminsamet I have a direct comparison now of the effect of using --multi-scale. Our results show a +1.6% mAP@0.5 boost: 49.3 to 50.9 on img-size 320. It helps very much prevent overfitting on the BCELoss terms after the LR drop. Before that it does not seem to show much visible effect. Maybe a smart training strategy would be to turn on multiscale right before the LR drop at epoch 218.
python3 train.py --arc default --weights 'weights/darknet53.conv.74' --img-size 320 --batch-size 64 --accumulate 1 --epochs 273 --multi-scale --prebias
Hello @glenn-jocher, thank you for the great work!
Here you report that you achieved 55.4 mAP with size 416 and configuration of YOLOv3 (yolov3.cfg). I wonder whether --multi-scale flag is set true for this 55.4 mAP.
If --multi-scale is not set, what mAP we should expect to achieve.
thanks in advance.