ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.15k stars 3.44k forks source link

Train YOLOv3-SPP from scratch to 62.6 mAP@0.5 #310

Closed ghost closed 4 years ago

ghost commented 5 years ago

Hi, Thanks for sharing your work ! I would like what is your configuration for the training of yolov3.cfg to get 55% MAP ? We tried 100 epochs but we got a MAP (35%) who don't really change much more. And the test loss start diverge a little. Why you give a very high loss gain for the confidence loss ? Thanks in advance for your reply. results

ghost commented 5 years ago

@glenn-jocher Yes, i got AP (IOU=0.5) 50.2%. I also raised conf-thres to delete the FPs...But the question i have is how original yolo did. It performs a little better and stable than my version. :-)

glenn-jocher commented 5 years ago

@Aurora33 Ah, original YOLOv3 320 reports 51.5 mAP@0.5 using the darknet testing code, and 51.8 mAP@0.5 using detect.py --save-json in this repo. See https://github.com/ultralytics/yolov3#map.

glenn-jocher commented 5 years ago

@Aurora33 BTW, we've opened up a new issue https://github.com/ultralytics/yolov3/issues/453 regarding overfitting on val Confidence.

glenn-jocher commented 5 years ago

@ktian08 @Aurora33 I just realized, the anchors in yolov3.cfg are pre-optimized for 416-size training, so you two may be getting subpar results due to this. I believe the darknet 320 results come from training 416 with multiscale, and then testing at 320/416/608.

I'm updating the training pipeline to suggest kmeans anchors automatically before training starts, which will help everyone, including custom data set users, but also us for COCO. Since we are training primarily on 320 images, we may be getting a subpar mAP due to the anchors being 416-optimized.

I'm not sure how multi-scale factors into all this, I need to think about it a bit more. multi-scale has not actually been shown to improve mAP by the way. In my comparisons to 10% training it has no effect, good or bad, and beware that default training already includes zoom in and out using the affine transform. The current j series hyps specify a random uniform scale of 'scale': 0.1059, # image scale (+/- gain) to each image regardless of --multi-scale usage.

Until the automatic scanning is in place, you can manually do a kmeans search with your exact img-size and anchor count like this (for coco):

from utils.utils import *; kmeans_targets(path='../coco/trainvalno5k.txt', n=9, img_size=320)
Reading labels (117263 found, 0 missing, 0 empty for 117263 images): 100%|██████████| 117263/117263 [00:13<00:00, 8422.13it/s]
kmeans anchors (n=9, img_size=320, IoU=0.00/0.18/0.57-min/mean/best): 10,11,  24,29,  66,37,  38,69,  64,129,  125,82,  249,97,  131,202,  270,219
wuhy08 commented 4 years ago

Hi @glenn-jocher

Thank you for your great work!

What is bugging me is that it seems very difficult to replicate the training result of the original darknet weights. In YoloV3 paper, not too much details is mentioned about hyperparameter tuning. And by examine the source code of Darknet, it seem not too much scaling is applied to each term of the loss function (or derivative of the loss function). I wonder if you know anyone successfully trained from pretrained backbone (Darknet53.conv.74) and get similar result as the original implementation, in any framework?

My worry is that although YOLOv3 surpass other NN architectures in the speed/AP tradeoff, other NN architectures are much easier to train to the result published. But YOLOv3 is so hard to replicate. Any thoughts on that?

Thank you again!

ahmedtalbi commented 4 years ago

Dear @glenn-jocher, I have some questions related to the best training procedure. 1/ can you please explain how to get the 50.2 % mAP using the evolve flag (it is not very straightforward) 2/ Do we need to train 273 epochs to get the best results? I thought 68 were enough. 3/ it seems that the best checkpoint is not always the best. Sometimes the last checkpoint is achieving better results in other resolutions. It is interesting to check the last checkpoint results after training,

Thanks a lot!

glenn-jocher commented 4 years ago

@ahmedtalbi

  1. The hyps have already been evolved, so just train normally: python3 train.py
  2. Original darknet trained to 273 epochs.
  3. best.pt or last.pt can both be tested after training.
glenn-jocher commented 4 years ago

@wuhy08 yes it is difficult to replicate training results of original darknet weights. Training was nonlinear in their case, i.e. they created a backbone, used it to initialize training, changed this during training etc, so training normally on Darknet will not reproduce the same mAP either I believe.

Yes you are right that darknet does not seem to apply effort to loss balancing and hyperparameter tuning.

Backbone does not seem to matter much, and in the comparison we have here actually produced worse results (50.2 mAP vs 50.5, see above results in this issue).

If you train with the default settings this repo should be within 1% of original darknet: https://github.com/ultralytics/yolov3/issues/310#issuecomment-518448296

glenn-jocher commented 4 years ago

See this for a testing example. Testing with the default settings (--batch-size 32, img-size 416) works fine of most cards with at least 10 GB of memory. https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw#scrollTo=0v0RFtO-WG9o

YOULANCHAI commented 4 years ago

@Aurora33 the mAPs reported in https://github.com/ultralytics/yolov3#map are using the original darknet weights files. We are still trying to determine the correct loss function and optimal hyperparameters for training in pytorch. There are a few issues open on this, such as #205 and #12. A couple things of note:

  • The plotted mAPs are at 0.1 conf_thres (for speed during training). If you run test.py directly it will run mAP at 0.001 conf_thres, which will produce a higher mAP.
  • Your LR scheduler may or may not have applied here, depending on how you set your number of epochs argument in the argparser --epochs.
  • Darknet training uses multi_scale by default, with scaling from 50% to 150% of your default size.
  • Darknet training also involves several steps I believe, including training on other datasets and altering layers. You can read about this more in the YOLOv2 and YOLOv3 papers: https://pjreddie.com/publications/
  • This implementation lacks the 0.7 ignore theshold in the original darknet, which is on our TODO list but not yet implemented.

@ktian08 ah I see. I forgot to mention that you should use the --save-json flag with test.py, as the official COCO mAP is usually about 1% higher than what the repo mAP code reports. You could try best.pt also instead of last.pt:

python3 test.py --weights weights/best.pt --img-size 320 --save-json

Hello, I used the train.py you gave to train my model, and the P curve and F1 curve I got fell from a high value to less than 0.1 each time in the last step of training.How can I solve this problem?

YOULANCHAI commented 4 years ago

@Aurora33 oh very interesting. @ktian08 trained with --multi-scale and did not use the darknet53.conv.74 backbone to get his results.

Since this model is trained with different hyperparameters it will have a different --conf-thres that you'll want to apply to it. As you can see all of the confidences are higher than with the default weights, so you may want to raise your conf-thres above the default setting in detect.py.

@Aurora33 the mAPs reported in https://github.com/ultralytics/yolov3#map are using the original darknet weights files. We are still trying to determine the correct loss function and optimal hyperparameters for training in pytorch. There are a few issues open on this, such as #205 and #12. A couple things of note:

  • The plotted mAPs are at 0.1 conf_thres (for speed during training). If you run test.py directly it will run mAP at 0.001 conf_thres, which will produce a higher mAP.
  • Your LR scheduler may or may not have applied here, depending on how you set your number of epochs argument in the argparser --epochs.
  • Darknet training uses multi_scale by default, with scaling from 50% to 150% of your default size.
  • Darknet training also involves several steps I believe, including training on other datasets and altering layers. You can read about this more in the YOLOv2 and YOLOv3 papers: https://pjreddie.com/publications/
  • This implementation lacks the 0.7 ignore theshold in the original darknet, which is on our TODO list but not yet implemented.

Hello, I used the train.py you gave to train my model, and the P curve and F1 curve I got fell from a high value to less than 0.1 each time in the last step of training.How can I solve this problem?

glenn-jocher commented 4 years ago

@YOULANCHAI yes this is normal. The last epoch tests at 0.001 conf-thresh (better mAP), vs 0.01 for all other epochs (faster).

glenn-jocher commented 4 years ago

@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).

The training command to achieve this is:

$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias

The results tested at 416 are:

$ python3 test.py --img-size 416 --save-json --weights weights/best.pt
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.575
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683

results

shuaitang5 commented 4 years ago

@glenn-jocher What do you mean by mosaic dataloader? Do you mind pasting a link? I'm continuing on @ktian08 's work

glenn-jocher commented 4 years ago

@louistang5 yes, multiple images are loaded at once in a mosaic:

thoang3 commented 4 years ago

Hi @glenn-jocher ,

Thank you for your great work! I've been following this repo since last year, and I am glad that you've been able to reproduce and exceed the results from the original authors and AlexeyAB. I'd like to share my training results using your repo as follow:

Training command (on 4 RTX 2080 Ti):

python train.py --data data/coco.data --img-size 320 --epochs 273 --batch-size 64 --accumulate 1 --multi-scale

Results from training:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.34046
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.53354
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.35533
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.13441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.37608
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.50283
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.29061
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.44660
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.46511
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.21699
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.51715
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66345
 273 epochs completed in 94.134 hours.

Results from testing best.pt on img-size 416:

python test.py --save-json --img-size 416 --weights weights/best.pt
Namespace(batch_size=16, cfg='cfg/yolov3-spp.cfg', conf_thres=0.001, data='data/coco.data', device='', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best.pt')
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.36592
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.56643
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.38656
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.18145
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.39801
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.51010
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.30655
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.47708
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.49770
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.29048
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.53917
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66408

I guess if I repeat the same training but with img-size=416 and then test on 608 then the results will be higher. Will keep you posted soon with the training command below:

python train.py --data data/coco.data --img-size 416 --epochs 273 --batch-size 64 --accumulate 1 --multi-scale

Regards,

@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).

The training command to achieve this is:

$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias

The results tested at 416 are:

$ python3 test.py --img-size 416 --save-json --weights weights/best.pt
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.575
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683

results

glenn-jocher commented 4 years ago

@thoang3 awesome!! That's a nice setup. I just finished training 416 --multi-scale on COCO again and now I get these results. 320 and 416 improve over darknet but 608 lags for some reason. It may have to do with the resizing that test.py runs, I was thinking maybe I should only shrink images but not expand them when they are loaded for inference during testing. Oh also, I found that using last.pt produces better results than best.pt.

python train.py --data data/coco.data --img-size 416 --epochs 273 --batch-size 32 --accumulate 2 --multi-scale --prebias
320 416 608
YOLOv3-SPP this repo last68.pt 0.539 0.587 0.601
YOLOv3-SPP this repo last67.pt 0.538 0.579 0.594
YOLOv3-SPP this repo (last49.pt) 0.537 0.577 0.591
YOLOv3-SPP darknet (yolov3-spp.weights) 0.523 0.568 0.607

YOLOv3-SPP 416 this repo last68.pt

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.382
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.175
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.543
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.492
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695

YOLOv3-SPP 416 darknet (yolov3-spp.weights)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.337
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.568
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.350
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.152
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.279
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.432
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.460
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.257
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
thoang3 commented 4 years ago

Hi @glenn-jocher ,

Finally I've got the training results! Phew!!!! I didn't expect it'd take 10 days for this. Test results after training (416):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.37487
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.57700
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.39644
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.17072
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.40540
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.53444
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.31117
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.48396
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.50363
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.27170
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.54213
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.69389
273 epochs completed in 240.003 hours.
python test.py --save-json --img-size 608 --weights weights/last.pt
Namespace(batch_size=16, cfg='cfg/yolov3-spp.cfg', conf_thres=0.001, data='data/coco.data', device='', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/last.pt')
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.38385
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.58747
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.40926
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.21912
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.42343
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.48429
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.31655
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.50559
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.52866
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35205
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.57015
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.64792
glenn-jocher commented 4 years ago

@thoang3 ahh yes, these look identical to the results I saw a couple weeks ago! Since then I realized that obj hyperparameter needs to be scaled by img-size when training since it was evolved at 320. A fix has been applied for this now, so mAP should increase a bit now as well. Using this model you have now, you should see better mAP@0.5:0.95 across all resolutions compared to darknet, and better mAP@0.5 at 320 and 416, but not 608.

One last change you can do to increase your mAP@0.5:0.95 is to increase your testing nms_thresh from 0.5 to 0.6 or 0.7.

thoang3 commented 4 years ago

@glenn-jocher Thank you for your info! I just wanted to test the training to verify that training is stable and potentially will lead to the desired mAP. I think with these tests, everybody now could feel highly confident to use your repo for their own projects.

My only concern now is the training time! Like I mentioned above, put accuracy aside, I didn't expect it'd take 10 days (with 4 RTX 2080Ti GPUs, on 416x416) to train 273 epochs (believe it or not I wanted to stop a few times, especially when I saw there's no accuracy improvement for tens of epochs haha, but now we know it might be due to the error you have just fixed). Last time I trained on 320 but with 1 GPU, and it took only around 4 days. Based on my observation, it seems like the GPUs are never fully utilized, but only fluctuate around 30-60% (perhaps that means we can increase batch size?). I have never trained full COCO on AlexeyAB darknet, but I have the feeling it would be faster to train full COCO on 416x416 using his repo. Let me know your thought on this!

glenn-jocher commented 4 years ago

@thoang3 yes its always a long training time on full coco. That said, your speeds are much slower than mine. I trained in about 5 days using a V100 on GCP. It's very important to install Nvidia Apex for mixed precision training though (it's automatically used if installed), as this will almost double your speed, and of course to use the largest batch size (up to --batch-size 64 --accumulate 1 if possible).

If Apex is installed correctly you will see this message at the start of training and testing: Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

If it is not found or installed incorrectly it will display this: Using CUDA device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

https://github.com/ultralytics/yolov3/blob/fa7c517ece0719a03d0746db40a79ebd6c8ad3e1/train.py#L12-L17

shahidammer commented 4 years ago

Hey @glenn-jocher I am running into

AttributeError: 'DistributedDataParallel' object has no attribute 'class_weights'

when i giving the --img-weights option?

err

One more thing, based on your recommendation i installed nvidia apex, its showing apex_device0 for one of my gpu but not for the other, is this expected?

Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11175MB) device1 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11178MB)

glenn-jocher commented 4 years ago

@shahidammer --img-weights is not recommended currently, it was found to lead to early overtraining. Though perhaps this might be less of an issue now with the mosaic loader in place.

The apex install is correct, it only shows on the first line.

glenn-jocher commented 4 years ago

@shahidammer the --img-weights error has been fixed in https://github.com/ultralytics/yolov3/commit/e58f0a68b6325e93d9ce98f66bcc3abb4b75a04e, so you should be good to go to use that option now.

shahidammer commented 4 years ago

Thank you @glenn-jocher.

cjnjuwhy commented 4 years ago

Hi, thanks for your codes, I tried 50 epochs in yolov3 with image size 320, here are my results

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.282
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.470
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.292
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.095
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.299
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.248
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.384
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.158
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
FranciscoReveriano commented 4 years ago

What is the original size of your images?

miracle-fmh commented 4 years ago

What tricks are you used to improve the the MAP from 57.7 mAP to 59.2?

glenn-jocher commented 4 years ago

@miracle-fmh better --nms_thres, better hyperparameters, slight increase in default --multi-scale.

Ringhu commented 4 years ago

Hi @glenn-jocher. Does the hyperparameters have the universality or it depends on the network you trained? Cause I want to train a new-designed yolo architechture with your code and of course, I need to train from scratch.

priyankasinghvi commented 4 years ago

How were you able to plot these graphs? I am sorry, pretty new to YOLO. Would be grateful. Currently I am just able to plot mAP and loss. I would also like to plot the graphs as you plotted above. I am using darknet to train my custom dataset on yolov3. Windows

glenn-jocher commented 4 years ago

@Ringhu hyperparameters are optimized for COCO, which covers a broad range of most object detection problems. You can always optimize your own hyperparameters on your own custom problem also. See https://github.com/ultralytics/yolov3/issues/392

@priyankasinghvi once you train, your results are saved to results.txt and plotted automatically as results.png. See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

priyankasinghvi commented 4 years ago

Hi @glenn-jocher , I have currently trained on AlexyAB's repo. on my own custom dataset. using darknet.exe detector train .......data files here ....cfg files here . I do not have the ultralytics repo. Any idea on how to generate a confusion matrix? or IoU plots?

glenn-jocher commented 4 years ago

@priyankasinghvi a confusion matrix is typically only generated for classification tasks. YOLOv3 is an object detection task. I don't know what you mean by IOU plots. The commands to get started training here are very simple, you can use your same exact labelled data you trained on darknet with. See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

priyankasinghvi commented 4 years ago

@glenn-jocher thank you. That just opened my eyes. Like a fool I was looking at producing a conf matrix. But i forgot to recall the basics. Thank you again!

shuaitang5 commented 4 years ago

@ktian08 @Aurora33 we've tuned hyperparameters and instituted a mosaic dataloader (see readme) which now produces (no backbone, multiscale) results of 53.3mAP@320 and 57.5mAP@416 using the current default settings. This produces better results than darknet at 320 and 416, but not 608 (probably need to use larger img-size for that).

The training command to achieve this is:

$ python3 train.py --data data/coco.data --img-size 416 --batch-size 16 --accumulate 4 --multi-scale --prebias

The results tested at 416 are:

$ python3 test.py --img-size 416 --save-json --weights weights/best.pt
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.575
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683

results

Hi @glenn-jocher , a quick question. In the training command above, you used image size 416. Then you have the result for both image size 320 and 416. Did you used the same trained weights and ran testing on different image sizes? In my case, I'm only interested in image size of 320. If I train from scratch with image size 320, will I be able to reproduce the same mAP you have here (I mean 53.3mAP for 320) ?

glenn-jocher commented 4 years ago

@louistang5 see https://github.com/ultralytics/yolov3#reproduce-our-results

To reproduce our results you should use our training command, otherwise your results will obviously differ.

glenn-jocher commented 4 years ago

Updated results: results

FranciscoReveriano commented 4 years ago

Nice. I am currently re-training at 640. 100 more epochs to go.

ghost commented 4 years ago

@glenn-jocher I have two major questions

  1. I have to get into trouble in understanding the output of yolov3 training results. Can you give me a general breakthrough on it? here is the part of the output information I want to an explanation for

v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 94 Avg (IOU: -nan, GIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.506638, .5R: -nan, .75R: -nan, count: 0 v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 106 Avg (IOU: 0.345226, GIOU: 0.263841), Class: 0.294111, Obj: 0.583763, No Obj: 0.539357, .5R: 0.200000, .75R: 0.000000, count: 5

  1. The other major question I have is can I change the parameter values for confidence threshold and IOU threshold. I am using https://github.com/AlexeyAB/darknet . when I test should I have to use the same confidence threshold and IOU threshold as I have used for the training?

this is how I train and test training !./darknet detector train data/trainer.data cfg/yolov3.cfg darknet53.conv.74 | tee backup/yolo-malaria.txt

testing !./darknet detector test data/trainer.data cfg/yolov3.cfg backup/yolov3_final.weights -thresh 0.1 -iou_thresh 0.3 data/img/plasmodium.jpg

glenn-jocher commented 4 years ago

@feulhak your output and questions are all related to alexeyab/darknet. This is ultralytics/yolov3, a completely different repo. To train on this repo see https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

tienthegainz commented 4 years ago

@glenn-jocher Thanks for the code. But may I ask why you set the conf_thres so low like that? I usually run evaluate on threshold of 0.5.

glenn-jocher commented 4 years ago

@tienthegainz mAP is the area under the P-R curve. To get the full curve you must go all the way down to zero confidence.

glenn-jocher commented 4 years ago

Latest results, 42.8 mAP@0.5:0.95. Training plots: results

Jelly123456 commented 4 years ago

Hi Glenn,

Thanks very much for creating this repo. I learned a lot from here.

I trained my own data. With only one class, the result is very good. GIOU is 0.4 and mAP@0.5 is 0.96. But when I trained with 2 classes, the result is quite bad. After running 95 epochs, GIOU is 2.5 and mAP@0.5 is 0, and it seems the GIoU is not decreasing. I used the existing hyper-parameters and the multi-scale is set to true and rectangular training is set to False. My dataset is quite small, each class has about 400 images.

Could you give me some advice about which part should I change to increase my mAP value?

glenn-jocher commented 4 years ago

@Jelly123456 oh that's really interesting. Is the 2 class dataset composed of the 1-class dataset plus another class? Can you see if you can train the second class as it's own dataset as well?

If both 1-class datasets train well, but when combined into a 2-class dataset train poorly, that would be very informative. I have not run any experiment like this myself.

The best way to show your results is to use your results.png file created after training.

Jelly123456 commented 4 years ago

@glenn-jocher . Is the 2 class dataset composed of the 1-class dataset plus another class? => Yes. Can you see if you can train the second class as it's own dataset as well? => I trained just now and the result is quite good. With just 10 epochs, I can get 99% mAP@0.5.

The result of just one class. image

The result of two classes: image

The two classes of my dataset are cruise and container. cruise: image

container: image

glenn-jocher commented 4 years ago

@Jelly123456 thanks, I see the example images. Can you train 1-class: cruise, and then 1-class: container, and show the both results.png and test_batch0.png files for each training please?

Also, we've made changes to the burnin recently that should avoid the spikes you are seeing early on in the validation losses. You can git clone again, or git pull from inside the yolov3 folder to get these updates.

glenn-jocher commented 4 years ago

@Jelly123456 after seeing your results and a few others I realized the high class loss on low-class count datasets was directly linked to the fact that we had tuned to coco (with 80 classes). I've introduced a balancing mechanism which should fix this part of your problem in c7f93bae403ed9cf9bd50319a29485643d2438ad

    c = nn.BCELoss()
    input = torch.tensor([0.001, 0.001, 0.001, 0.001, 0.8])  # four negative and 1 positive sample
    target = torch.tensor([0., 0., 0., 0., 1.])
    for i in range(4):  # test different class counts
        loss = c(input[i:], target[i:]) * len(input[i:])
        print(loss)

tensor(0.22715)
tensor(0.22614)
tensor(0.22514)
tensor(0.22414)
Jelly123456 commented 4 years ago

@glenn-jocher, the new code works very well and solved my problem. Thanks very much.