ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.21k stars 3.45k forks source link

--multi-scale flag for the reported results #472

Closed nerminsamet closed 4 years ago

nerminsamet commented 5 years ago

Hello @glenn-jocher, thank you for the great work!

Here you report that you achieved 55.4 mAP with size 416 and configuration of YOLOv3 (yolov3.cfg). I wonder whether --multi-scale flag is set true for this 55.4 mAP.

If --multi-scale is not set, what mAP we should expect to achieve.

thanks in advance.

glenn-jocher commented 5 years ago

@nerminsamet that's a good question! The mAPs reported at https://github.com/ultralytics/yolov3#map are using the darknet trained yolov3-spp.weights file. You can reproduce these with the code at the link.

If you train using this repo you get pytorch weights in a *.pt format. The training results are constantly improving, with the latest results coming within 1% of darknet trained results. See https://github.com/ultralytics/yolov3/issues/310 for a more detailed discussion.

In relation to your exact question regarding the relationship of multiscale with final mAP, the correlation is debatable. Darknet training uses it by default, but I have not observed any improvement when testing at the same resolution as your training img_size. It may help, but I can't say I've observed anything consistent with that being the case. Testing at other resolutions it should obviously help, so I believe it depends on your final intended use of the trained model.

glenn-jocher commented 5 years ago

@nerminsamet of course, if you have the resources, I would simply train both ways to compare. If you do this let us know your results!

nerminsamet commented 5 years ago

@glenn-jocher I am training this repo with the following configuration. Right now it is in the 168th epoch. I share my lastest mAP result for coco5kval at 167th epoch below. Once my training is done I will also share the final mAP.

Namespace(accumulate=1, batch_size=64, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, nosave=False, notest=False, rect=False, resume=True, transfer=False)

Result of 167th epoch: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.244
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.454 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.236 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.101 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.341 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.229 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.363 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.386 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.192 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.420 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521

glenn-jocher commented 5 years ago

@nerminsamet ah, excellent, you are committed in your training. I'll share a few tricks briefly. We typically develop based on small dataset results, like coco_16img.data, which allows rapid prototyping since training only takes a few minutes. This is useful for sanity checks and rough ideas, but results here do not correlate fully with results on the full coco dataset, so once we develop an idea on a small dataset we test it on 1 or 2 full coco epochs to tweak further, and then we test to 10% full training for a full statistical comparison. We do all this at 320 for speed, and with yolov3-spp since it bumps mAP by 1% at almost no compute expense. These are now reaching 45% mAP after 27 epochs at 320 (no multi-scale). You can see some studies we've done at https://github.com/ultralytics/yolov3/issues/441#issuecomment-520229791

The 10% training command we use is:

python3 train.py --weights weights/darknet53.conv.74 --img-size 320 --epochs 27 --batch-size 64 --accumulate 1

We've also had a weight_decay bug in place which we just fixed today, which seems to greatly impact performance: https://github.com/ultralytics/yolov3/issues/469

nerminsamet commented 5 years ago

hi @glenn-jocher, thanks for the tricks. I first tested the code on coco_16img.data and everything was ok. Now my training is over, I got 52.2 mAP which is 3.1 behind original results. I think multi-scale training could be one reason. I will train with multi-scale setup also. I will inform you about the new results.

Final results. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.325 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.142 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.275 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.429 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.251 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.490 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601

glenn-jocher commented 5 years ago

@nerminsamet hmm ok! Maybe it is due to multiscale. Can you post your training results? If you have a recent version of the repo a results.png file will appear after training that plots results.txt. If no the plotting command is from utils.utils import *; plot_results(), this will plot any results*.txt files it finds.

Starting new training overwrites any existing results.txt files, so I usually rename a results.txt file after training to something new like results_416.txt so it doesn't get overwritten.

nerminsamet commented 5 years ago

@glenn-jocher here is the results!

results

glenn-jocher commented 5 years ago

@nerminsamet wow, ok this shows severe overtraining on the LR drop at 0.8 * 273 epochs.

Classification and Confidence may possibly be to overtraining in relation to their positive weight hyperparameters, which are about 1.5 and 4.0.

I've tried removing these recently (setting them to 1.0 and passing their values to the respective loss gains to make obj and cls gains at 40 and 80 with pw's at 1 and 1), but initial mAPs do not respond as well as the default... so perhaps initial results are worse, but long term may show net positive, I don't know.

glenn-jocher commented 5 years ago

@nerminsamet I think we can use the results below (320 with multiscale) as a proxy for what you might expect. It looks like multiscale reduces the overfitting a bit, but I believe the real culprit are the positive weights in the two BCELosses. Depending on how dirty you want to get your hands there are few changes you could try to the loss function, and there is also the hyperparameter evolution path you could explore: https://github.com/ultralytics/yolov3/issues/392.

It feels like the training is a few steps away from darknet level mAP, but unfortunately we just don't have the resources to explore all the alternatives we'd like to currently.

https://github.com/ultralytics/yolov3/issues/310#issuecomment-521115339 320 multiscale image

glenn-jocher commented 5 years ago

@nerminsamet hi there! Did you get different results with --multi-scale?

glenn-jocher commented 5 years ago

@nerminsamet I have a direct comparison now of the effect of using --multi-scale. Our results show a +1.6% mAP@0.5 boost: 49.3 to 50.9 on img-size 320. It helps very much prevent overfitting on the BCELoss terms after the LR drop. Before that it does not seem to show much visible effect. Maybe a smart training strategy would be to turn on multiscale right before the LR drop at epoch 218.

python3 train.py --arc default --weights 'weights/darknet53.conv.74' --img-size 320 --batch-size 64 --accumulate 1 --epochs 273 --multi-scale --prebias

results