Closed ghost closed 4 years ago
@glenn-jocher Yes, i got AP (IOU=0.5) 50.2%. I also raised conf-thres to delete the FPs...But the question i have is how original yolo did. It performs a little better and stable than my version. :-)
@Aurora33 Ah, original YOLOv3 320 reports 51.5 mAP@0.5 using the darknet testing code, and 51.8 mAP@0.5 using detect.py --save-json
in this repo. See https://github.com/ultralytics/yolov3#map.
@Aurora33 BTW, we've opened up a new issue https://github.com/ultralytics/yolov3/issues/453 regarding overfitting on val Confidence
.
@ktian08 @Aurora33 I just realized, the anchors in yolov3.cfg are pre-optimized for 416-size training, so you two may be getting subpar results due to this. I believe the darknet 320 results come from training 416 with multiscale, and then testing at 320/416/608.
I'm updating the training pipeline to suggest kmeans anchors automatically before training starts, which will help everyone, including custom data set users, but also us for COCO. Since we are training primarily on 320 images, we may be getting a subpar mAP due to the anchors being 416-optimized.
I'm not sure how multi-scale factors into all this, I need to think about it a bit more. multi-scale has not actually been shown to improve mAP by the way. In my comparisons to 10% training it has no effect, good or bad, and beware that default training already includes zoom in and out using the affine transform. The current j series hyps specify a random uniform scale of 'scale': 0.1059, # image scale (+/- gain)
to each image regardless of --multi-scale usage.
Until the automatic scanning is in place, you can manually do a kmeans search with your exact img-size
and anchor count like this (for coco):
from utils.utils import *; kmeans_targets(path='../coco/trainvalno5k.txt', n=9, img_size=320)
Reading labels (117263 found, 0 missing, 0 empty for 117263 images): 100%|██████████| 117263/117263 [00:13<00:00, 8422.13it/s]
kmeans anchors (n=9, img_size=320, IoU=0.00/0.18/0.57-min/mean/best): 10,11, 24,29, 66,37, 38,69, 64,129, 125,82, 249,97, 131,202, 270,219
Hi @glenn-jocher
Thank you for your great work!
What is bugging me is that it seems very difficult to replicate the training result of the original darknet weights. In YoloV3 paper, not too much details is mentioned about hyperparameter tuning. And by examine the source code of Darknet, it seem not too much scaling is applied to each term of the loss function (or derivative of the loss function). I wonder if you know anyone successfully trained from pretrained backbone (Darknet53.conv.74) and get similar result as the original implementation, in any framework?
My worry is that although YOLOv3 surpass other NN architectures in the speed/AP tradeoff, other NN architectures are much easier to train to the result published. But YOLOv3 is so hard to replicate. Any thoughts on that?
Thank you again!
Dear @glenn-jocher, I have some questions related to the best training procedure. 1/ can you please explain how to get the 50.2 % mAP using the evolve flag (it is not very straightforward) 2/ Do we need to train 273 epochs to get the best results? I thought 68 were enough. 3/ it seems that the best checkpoint is not always the best. Sometimes the last checkpoint is achieving better results in other resolutions. It is interesting to check the last checkpoint results after training,
Thanks a lot!
@ahmedtalbi
python3 train.py
@wuhy08 yes it is difficult to replicate training results of original darknet weights. Training was nonlinear in their case, i.e. they created a backbone, used it to initialize training, changed this during training etc, so training normally on Darknet will not reproduce the same mAP either I believe.
Yes you are right that darknet does not seem to apply effort to loss balancing and hyperparameter tuning.
Backbone does not seem to matter much, and in the comparison we have here actually produced worse results (50.2 mAP vs 50.5, see above results in this issue).
If you train with the default settings this repo should be within 1% of original darknet: https://github.com/ultralytics/yolov3/issues/310#issuecomment-518448296
See this for a testing example. Testing with the default settings (--batch-size 32, img-size 416
) works fine of most cards with at least 10 GB of memory.
https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw#scrollTo=0v0RFtO-WG9o
@Aurora33 the mAPs reported in https://github.com/ultralytics/yolov3#map are using the original darknet weights files. We are still trying to determine the correct loss function and optimal hyperparameters for training in pytorch. There are a few issues open on this, such as #205 and #12. A couple things of note:
- The plotted mAPs are at 0.1 conf_thres (for speed during training). If you run test.py directly it will run mAP at 0.001 conf_thres, which will produce a higher mAP.
- Your LR scheduler may or may not have applied here, depending on how you set your number of epochs argument in the argparser
--epochs
.- Darknet training uses multi_scale by default, with scaling from 50% to 150% of your default size.
- Darknet training also involves several steps I believe, including training on other datasets and altering layers. You can read about this more in the YOLOv2 and YOLOv3 papers: https://pjreddie.com/publications/
- This implementation lacks the 0.7 ignore theshold in the original darknet, which is on our TODO list but not yet implemented.
@ktian08 ah I see. I forgot to mention that you should use the
--save-json
flag with test.py, as the official COCO mAP is usually about 1% higher than what the repo mAP code reports. You could try best.pt also instead of last.pt:
python3 test.py --weights weights/best.pt --img-size 320 --save-json
Hello, I used the train.py you gave to train my model, and the P curve and F1 curve I got fell from a high value to less than 0.1 each time in the last step of training.How can I solve this problem?
@Aurora33 oh very interesting. @ktian08 trained with --multi-scale and did not use the darknet53.conv.74 backbone to get his results.
Since this model is trained with different hyperparameters it will have a different
--conf-thres
that you'll want to apply to it. As you can see all of the confidences are higher than with the default weights, so you may want to raise your conf-thres above the default setting in detect.py.@Aurora33 the mAPs reported in https://github.com/ultralytics/yolov3#map are using the original darknet weights files. We are still trying to determine the correct loss function and optimal hyperparameters for training in pytorch. There are a few issues open on this, such as #205 and #12. A couple things of note:
- The plotted mAPs are at 0.1 conf_thres (for speed during training). If you run test.py directly it will run mAP at 0.001 conf_thres, which will produce a higher mAP.
- Your LR scheduler may or may not have applied here, depending on how you set your number of epochs argument in the argparser
--epochs
.- Darknet training uses multi_scale by default, with scaling from 50% to 150% of your default size.
- Darknet training also involves several steps I believe, including training on other datasets and altering layers. You can read about this more in the YOLOv2 and YOLOv3 papers: https://pjreddie.com/publications/
- This implementation lacks the 0.7 ignore theshold in the original darknet, which is on our TODO list but not yet implemented.
Hello, I used the train.py you gave to train my model, and the P curve and F1 curve I got fell from a high value to less than 0.1 each time in the last step of training.How can I solve this problem?
@YOULANCHAI yes this is normal. The last epoch tests at 0.001 conf-thresh (better mAP), vs 0.01 for all other epochs (faster).
@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).
The training command to achieve this is:
$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias
The results tested at 416 are:
$ python3 test.py --img-size 416 --save-json --weights weights/best.pt
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.575
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.309
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.480
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
@glenn-jocher What do you mean by mosaic dataloader? Do you mind pasting a link? I'm continuing on @ktian08 's work
@louistang5 yes, multiple images are loaded at once in a mosaic:
Hi @glenn-jocher ,
Thank you for your great work! I've been following this repo since last year, and I am glad that you've been able to reproduce and exceed the results from the original authors and AlexeyAB. I'd like to share my training results using your repo as follow:
Training command (on 4 RTX 2080 Ti):
python train.py --data data/coco.data --img-size 320 --epochs 273 --batch-size 64 --accumulate 1 --multi-scale
Results from training:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.34046
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.53354
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.35533
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.13441
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.37608
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.50283
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.29061
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.44660
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.46511
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.21699
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.51715
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66345
273 epochs completed in 94.134 hours.
Results from testing best.pt on img-size 416:
python test.py --save-json --img-size 416 --weights weights/best.pt
Namespace(batch_size=16, cfg='cfg/yolov3-spp.cfg', conf_thres=0.001, data='data/coco.data', device='', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best.pt')
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.36592
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.56643
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.38656
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.18145
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.39801
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.51010
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.30655
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.47708
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.49770
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.29048
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.53917
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66408
I guess if I repeat the same training but with img-size=416 and then test on 608 then the results will be higher. Will keep you posted soon with the training command below:
python train.py --data data/coco.data --img-size 416 --epochs 273 --batch-size 64 --accumulate 1 --multi-scale
Regards,
@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).
The training command to achieve this is:
$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias
The results tested at 416 are:
$ python3 test.py --img-size 416 --save-json --weights weights/best.pt Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.575 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.309 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.480 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
@thoang3 awesome!! That's a nice setup. I just finished training 416 --multi-scale on COCO again and now I get these results. 320 and 416 improve over darknet but 608 lags for some reason. It may have to do with the resizing that test.py runs, I was thinking maybe I should only shrink images but not expand them when they are loaded for inference during testing. Oh also, I found that using last.pt produces better results than best.pt.
python train.py --data data/coco.data --img-size 416 --epochs 273 --batch-size 32 --accumulate 2 --multi-scale --prebias
320 | 416 | 608 | |
---|---|---|---|
YOLOv3-SPP this repo last68.pt | 0.539 | 0.587 | 0.601 |
YOLOv3-SPP this repo last67.pt | 0.538 | 0.579 | 0.594 |
YOLOv3-SPP this repo (last49.pt) | 0.537 | 0.577 | 0.591 |
YOLOv3-SPP darknet (yolov3-spp.weights ) |
0.523 | 0.568 | 0.607 |
YOLOv3-SPP 416 this repo last68.pt
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.382
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.587
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.402
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.175
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.316
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.492
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.512
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.557
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695
YOLOv3-SPP 416 darknet (yolov3-spp.weights
)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.337
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.568
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.350
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.152
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.496
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.279
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.432
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.460
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.257
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
Hi @glenn-jocher ,
Finally I've got the training results! Phew!!!! I didn't expect it'd take 10 days for this. Test results after training (416):
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.37487
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.57700
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.39644
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.17072
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.40540
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.53444
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.31117
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.48396
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.50363
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.27170
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.54213
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.69389
273 epochs completed in 240.003 hours.
python test.py --save-json --img-size 608 --weights weights/last.pt
Namespace(batch_size=16, cfg='cfg/yolov3-spp.cfg', conf_thres=0.001, data='data/coco.data', device='', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/last.pt')
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.38385
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.58747
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.40926
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.21912
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.42343
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.48429
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.31655
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.50559
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.52866
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35205
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.57015
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.64792
@thoang3 ahh yes, these look identical to the results I saw a couple weeks ago! Since then I realized that obj hyperparameter needs to be scaled by img-size when training since it was evolved at 320. A fix has been applied for this now, so mAP should increase a bit now as well. Using this model you have now, you should see better mAP@0.5:0.95 across all resolutions compared to darknet, and better mAP@0.5 at 320 and 416, but not 608.
One last change you can do to increase your mAP@0.5:0.95 is to increase your testing nms_thresh from 0.5 to 0.6 or 0.7.
@glenn-jocher Thank you for your info! I just wanted to test the training to verify that training is stable and potentially will lead to the desired mAP. I think with these tests, everybody now could feel highly confident to use your repo for their own projects.
My only concern now is the training time! Like I mentioned above, put accuracy aside, I didn't expect it'd take 10 days (with 4 RTX 2080Ti GPUs, on 416x416) to train 273 epochs (believe it or not I wanted to stop a few times, especially when I saw there's no accuracy improvement for tens of epochs haha, but now we know it might be due to the error you have just fixed). Last time I trained on 320 but with 1 GPU, and it took only around 4 days. Based on my observation, it seems like the GPUs are never fully utilized, but only fluctuate around 30-60% (perhaps that means we can increase batch size?). I have never trained full COCO on AlexeyAB darknet, but I have the feeling it would be faster to train full COCO on 416x416 using his repo. Let me know your thought on this!
@thoang3 yes its always a long training time on full coco. That said, your speeds are much slower than mine. I trained in about 5 days using a V100 on GCP. It's very important to install Nvidia Apex for mixed precision training though (it's automatically used if installed), as this will almost double your speed, and of course to use the largest batch size (up to --batch-size 64 --accumulate 1
if possible).
If Apex is installed correctly you will see this message at the start of training and testing:
Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)
If it is not found or installed incorrectly it will display this:
Using CUDA device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)
https://github.com/ultralytics/yolov3/blob/fa7c517ece0719a03d0746db40a79ebd6c8ad3e1/train.py#L12-L17
Hey @glenn-jocher I am running into
AttributeError: 'DistributedDataParallel' object has no attribute 'class_weights'
when i giving the --img-weights option?
One more thing, based on your recommendation i installed nvidia apex, its showing apex_device0 for one of my gpu but not for the other, is this expected?
Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11175MB) device1 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11178MB)
@shahidammer --img-weights
is not recommended currently, it was found to lead to early overtraining. Though perhaps this might be less of an issue now with the mosaic loader in place.
The apex install is correct, it only shows on the first line.
@shahidammer the --img-weights
error has been fixed in https://github.com/ultralytics/yolov3/commit/e58f0a68b6325e93d9ce98f66bcc3abb4b75a04e, so you should be good to go to use that option now.
Thank you @glenn-jocher.
Hi, thanks for your codes, I tried 50 epochs in yolov3 with image size 320, here are my results
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.282
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.470
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.095
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.299
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.433
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.248
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.384
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.401
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.158
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
What is the original size of your images?
What tricks are you used to improve the the MAP from 57.7 mAP to 59.2?
@miracle-fmh better --nms_thres, better hyperparameters, slight increase in default --multi-scale.
Hi @glenn-jocher. Does the hyperparameters have the universality or it depends on the network you trained? Cause I want to train a new-designed yolo architechture with your code and of course, I need to train from scratch.
How were you able to plot these graphs? I am sorry, pretty new to YOLO. Would be grateful. Currently I am just able to plot mAP and loss. I would also like to plot the graphs as you plotted above. I am using darknet to train my custom dataset on yolov3. Windows
@Ringhu hyperparameters are optimized for COCO, which covers a broad range of most object detection problems. You can always optimize your own hyperparameters on your own custom problem also. See https://github.com/ultralytics/yolov3/issues/392
@priyankasinghvi once you train, your results are saved to results.txt and plotted automatically as results.png. See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
Hi @glenn-jocher , I have currently trained on AlexyAB's repo. on my own custom dataset. using darknet.exe detector train .......data files here ....cfg files here . I do not have the ultralytics repo. Any idea on how to generate a confusion matrix? or IoU plots?
@priyankasinghvi a confusion matrix is typically only generated for classification tasks. YOLOv3 is an object detection task. I don't know what you mean by IOU plots. The commands to get started training here are very simple, you can use your same exact labelled data you trained on darknet with. See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
@glenn-jocher thank you. That just opened my eyes. Like a fool I was looking at producing a conf matrix. But i forgot to recall the basics. Thank you again!
@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).
The training command to achieve this is:
$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias
The results tested at 416 are:
$ python3 test.py --img-size 416 --save-json --weights weights/best.pt Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.575 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.309 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.480 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
Hi @glenn-jocher , a quick question. In the training command above, you used image size 416. Then you have the result for both image size 320 and 416. Did you used the same trained weights and ran testing on different image sizes? In my case, I'm only interested in image size of 320. If I train from scratch with image size 320, will I be able to reproduce the same mAP you have here (I mean 53.3mAP for 320) ?
@louistang5 see https://github.com/ultralytics/yolov3#reproduce-our-results
To reproduce our results you should use our training command, otherwise your results will obviously differ.
Updated results:
Nice. I am currently re-training at 640. 100 more epochs to go.
@glenn-jocher I have two major questions
v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 94 Avg (IOU: -nan, GIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.506638, .5R: -nan, .75R: -nan, count: 0 v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 106 Avg (IOU: 0.345226, GIOU: 0.263841), Class: 0.294111, Obj: 0.583763, No Obj: 0.539357, .5R: 0.200000, .75R: 0.000000, count: 5
this is how I train and test training !./darknet detector train data/trainer.data cfg/yolov3.cfg darknet53.conv.74 | tee backup/yolo-malaria.txt
testing !./darknet detector test data/trainer.data cfg/yolov3.cfg backup/yolov3_final.weights -thresh 0.1 -iou_thresh 0.3 data/img/plasmodium.jpg
@feulhak your output and questions are all related to alexeyab/darknet. This is ultralytics/yolov3, a completely different repo. To train on this repo see https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
@glenn-jocher Thanks for the code. But may I ask why you set the conf_thres
so low like that? I usually run evaluate on threshold of 0.5.
@tienthegainz mAP is the area under the P-R curve. To get the full curve you must go all the way down to zero confidence.
Latest results, 42.8 mAP@0.5:0.95. Training plots:
Hi Glenn,
Thanks very much for creating this repo. I learned a lot from here.
I trained my own data. With only one class, the result is very good. GIOU is 0.4 and mAP@0.5 is 0.96. But when I trained with 2 classes, the result is quite bad. After running 95 epochs, GIOU is 2.5 and mAP@0.5 is 0, and it seems the GIoU is not decreasing. I used the existing hyper-parameters and the multi-scale is set to true and rectangular training is set to False. My dataset is quite small, each class has about 400 images.
Could you give me some advice about which part should I change to increase my mAP value?
@Jelly123456 oh that's really interesting. Is the 2 class dataset composed of the 1-class dataset plus another class? Can you see if you can train the second class as it's own dataset as well?
If both 1-class datasets train well, but when combined into a 2-class dataset train poorly, that would be very informative. I have not run any experiment like this myself.
The best way to show your results is to use your results.png file created after training.
@glenn-jocher . Is the 2 class dataset composed of the 1-class dataset plus another class? => Yes. Can you see if you can train the second class as it's own dataset as well? => I trained just now and the result is quite good. With just 10 epochs, I can get 99% mAP@0.5.
The result of just one class.
The result of two classes:
The two classes of my dataset are cruise and container. cruise:
container:
@Jelly123456 thanks, I see the example images. Can you train 1-class: cruise, and then 1-class: container, and show the both results.png and test_batch0.png files for each training please?
Also, we've made changes to the burnin recently that should avoid the spikes you are seeing early on in the validation losses. You can git clone
again, or git pull
from inside the yolov3 folder to get these updates.
@Jelly123456 after seeing your results and a few others I realized the high class loss on low-class count datasets was directly linked to the fact that we had tuned to coco (with 80 classes). I've introduced a balancing mechanism which should fix this part of your problem in c7f93bae403ed9cf9bd50319a29485643d2438ad
c = nn.BCELoss()
input = torch.tensor([0.001, 0.001, 0.001, 0.001, 0.8]) # four negative and 1 positive sample
target = torch.tensor([0., 0., 0., 0., 1.])
for i in range(4): # test different class counts
loss = c(input[i:], target[i:]) * len(input[i:])
print(loss)
tensor(0.22715)
tensor(0.22614)
tensor(0.22514)
tensor(0.22414)
@glenn-jocher, the new code works very well and solved my problem. Thanks very much.
Hi, Thanks for sharing your work ! I would like what is your configuration for the training of yolov3.cfg to get 55% MAP ? We tried 100 epochs but we got a MAP (35%) who don't really change much more. And the test loss start diverge a little. Why you give a very high loss gain for the confidence loss ? Thanks in advance for your reply.