ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.19k stars 3.44k forks source link

Different mAP with your post #199

Closed yanlongbinluck closed 5 years ago

yanlongbinluck commented 5 years ago

Hi, thanks for your excellent repo. I clone your repo and train it on coco2014 with default setting except set batch_size to 32 (2 GPUs), but when I train 78 epochs, the mAP still 0.38, not 0.5~0.6 in your post. why? image

and when I set batch_size to 16, when I train 20 epochs (1 GPUs), the mAP is about 0.35, not 0.45~0.5 in your post. image

could you give me some suggestions? thank you

glenn-jocher commented 5 years ago

@yanlongbinluck yes there are a few reasons. First though, git pull to align your repo with the current version, you are several commits behind I see.

  1. Darknet trains to 500200 batches at 64 images per batch, which works out to 273 epochs. It also uses several tricks to get their reported mAP, included an LR scheduler and multi-scale training, which you have not done. I suggest you do some research on the topic.

  2. The mAP computed during training is at conf_thres 0.1 for speed. To get proper test mAP you need to run test.py --weights weights/best.pt, which will compute mAP at conf_thres 0.001. This usually increases the reported mAP about 5%.

  3. The plots shown in the README, which you are presumably referring to (you did not specify a source) are very old, which incorrect mAP computations. I've removed them and updated them with the custom training wiki results: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data results

  4. The optimal loss function formulation for this repo is not yet known. Default darknet loss does not produce very good results, at least for the first few epochs we have tested it. You are free to tune hyperparameters and adjust the loss function to search for improvements as well. We could use all the help we can get in terms of GPU time. The main differences between this repo and darknet currently are CE vs BCE loss for classification, and the exact loss constants used for the terms xy, wh, cls, conf. These are found here: https://github.com/ultralytics/yolov3/blob/6cb3c61320cb6508e4dc5be813c43f4155378a58/utils/utils.py#L274-L288

glenn-jocher commented 5 years ago

@dsp6414 suggest you follow our GCP quickstart if you want reproducible results in a well understood linux environment. Linux operations operate correctly if you have your environment set up correctly. https://docs.ultralytics.com/yolov5/environments/google_cloud_quickstart_tutorial/

glenn-jocher commented 5 years ago

@dsp6414 we just posted an update that saves a jpg showing the first batch of training and test data. These two files are saved as train_batch0.jpg and test_batch0.jpg. You should see them appear after the first epoch.

Can you git pull to get the latest updates and then upload your two jpgs here?

yanlongbinluck commented 5 years ago

@glenn-jocher ,ok,thanks for your reply. I will test current version repo. Have you ever achieved mAP in your readme as follow? how many epochs? do you use an LR scheduler and multi-scale training? I am eager to reproduce your mAP. image

yanlongbinluck commented 5 years ago

@dsp6414 , I think you had better to do experiments with coco dataset to check your linux environment. my environment is ubuntu18.04, it is ok.

glenn-jocher commented 5 years ago

@yanlongbinluck ah, these values are obtained by running test.py on yolov3.weights or yolov3.pt (they are the same weights in darknet and pytorch formats). For example run this code: python3 test.py --save-json --img-size 608 --batch-size 16

glenn-jocher commented 5 years ago

@yanlongbinluck you should get these results when testing with the official weights. Training from scratch is a challenge though, and involves many steps to get close to the published darknet results, as I mentioned before, multi-scale, LR changes, batch size 64, 500200 batches, augmentation etc.

python3 test.py --save-json --img-size 608 --batch-size 16
Namespace(batch_size=16, cfg='cfg/yolov3-spp.cfg', conf_thres=0.001, data_cfg='data/coco.data', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3-spp.weights')

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', total_memory=16130MB)
               Class    Images   Targets         P         R       mAP        F1
Computing mAP: 100%|█████████████████████████████████████| 313/313 [06:40<00:00,  1.24s/it]
                 all     5e+03  3.58e+04      0.12      0.81     0.611     0.203

              person     5e+03  1.09e+04     0.165       0.9     0.767     0.278
             bicycle     5e+03       316    0.0741     0.832     0.584     0.136
                 car     5e+03  1.67e+03    0.0814     0.897     0.699     0.149
          motorcycle     5e+03       391     0.166     0.852     0.722     0.278
            airplane     5e+03       131     0.192     0.931      0.88     0.319
                 bus     5e+03       261     0.208      0.87     0.823     0.336
               train     5e+03       212     0.173     0.892      0.82     0.289
               truck     5e+03       352     0.105     0.665     0.523     0.181
                boat     5e+03       475     0.096     0.792     0.521     0.171
       traffic light     5e+03       516    0.0521       0.8     0.557    0.0978
        fire hydrant     5e+03        83     0.184     0.928     0.884     0.307
           stop sign     5e+03        84    0.0931     0.893     0.826     0.169
       parking meter     5e+03        59    0.0727     0.695     0.619     0.132
               bench     5e+03       473    0.0365     0.702     0.391    0.0693
                bird     5e+03       469    0.0875     0.689     0.527     0.155
                 cat     5e+03       195     0.301     0.872      0.78     0.448
                 dog     5e+03       223     0.256     0.879     0.826     0.397
               horse     5e+03       305     0.175     0.931      0.86     0.294
               sheep     5e+03       321     0.249     0.841     0.728     0.384
                 cow     5e+03       384     0.186     0.831     0.731     0.305
            elephant     5e+03       284     0.253     0.972     0.922     0.401
                bear     5e+03        53       0.4     0.906     0.861     0.555
               zebra     5e+03       277     0.251     0.946     0.875     0.397
             giraffe     5e+03       170      0.24     0.929     0.894     0.382
            backpack     5e+03       384    0.0512     0.755     0.428     0.096
            umbrella     5e+03       392     0.105     0.875     0.659     0.188
             handbag     5e+03       483    0.0294     0.737     0.322    0.0565
                 tie     5e+03       297    0.0681     0.848     0.606     0.126
            suitcase     5e+03       310     0.154     0.913     0.696     0.263
             frisbee     5e+03       109     0.189     0.908     0.862     0.313
                skis     5e+03       282    0.0667     0.762     0.451     0.123
           snowboard     5e+03        92     0.104     0.804     0.555     0.185
         sports ball     5e+03       236    0.0822     0.763     0.673     0.148
                kite     5e+03       399     0.181      0.83     0.608     0.297
        baseball bat     5e+03       125     0.083     0.736     0.559     0.149
      baseball glove     5e+03       139    0.0754     0.806     0.649     0.138
          skateboard     5e+03       218     0.118     0.867     0.785     0.208
           surfboard     5e+03       266    0.0927     0.812     0.661     0.166
       tennis racket     5e+03       183     0.141     0.869     0.753     0.243
              bottle     5e+03       966    0.0767     0.823     0.534      0.14
          wine glass     5e+03       366     0.113     0.779     0.585     0.198
                 cup     5e+03       897    0.0928     0.837     0.599     0.167
                fork     5e+03       234    0.0659     0.731       0.5     0.121
               knife     5e+03       291    0.0492     0.684     0.358    0.0919
               spoon     5e+03       253    0.0426     0.755     0.324    0.0806
                bowl     5e+03       620       0.1     0.894     0.573     0.181
              banana     5e+03       371    0.0876     0.695     0.336     0.156
               apple     5e+03       158    0.0521     0.734     0.238    0.0973
            sandwich     5e+03       160      0.12     0.781      0.52     0.208
              orange     5e+03       189    0.0601     0.667     0.286      0.11
            broccoli     5e+03       332       0.1     0.783     0.387     0.178
              carrot     5e+03       346    0.0633     0.673     0.298     0.116
             hot dog     5e+03       164     0.145     0.598     0.458     0.234
               pizza     5e+03       224     0.111     0.804     0.659     0.195
               donut     5e+03       237     0.148     0.802     0.637      0.25
                cake     5e+03       241     0.105     0.734     0.552     0.184
               chair     5e+03  1.62e+03    0.0703     0.757     0.473     0.129
               couch     5e+03       236     0.129     0.788     0.611     0.221
        potted plant     5e+03       431    0.0571     0.824      0.49     0.107
                 bed     5e+03       195     0.157     0.836     0.717     0.265
        dining table     5e+03       634    0.0659     0.828     0.511     0.122
              toilet     5e+03       179      0.24     0.944     0.836     0.383
                  tv     5e+03       257      0.13     0.946     0.825     0.229
              laptop     5e+03       237      0.19     0.886     0.774     0.313
               mouse     5e+03        95    0.0893     0.895     0.742     0.162
              remote     5e+03       241    0.0687     0.834     0.582     0.127
            keyboard     5e+03       117    0.0879     0.906     0.755      0.16
          cell phone     5e+03       291    0.0425     0.742     0.475    0.0803
           microwave     5e+03        88     0.226      0.92     0.823     0.362
                oven     5e+03       142    0.0816     0.845     0.561     0.149
             toaster     5e+03        11    0.0899     0.727     0.412      0.16
                sink     5e+03       211    0.0732     0.853     0.616     0.135
        refrigerator     5e+03       107    0.0932     0.935     0.786     0.169
                book     5e+03  1.08e+03    0.0593     0.654       0.2     0.109
               clock     5e+03       292    0.0817     0.877     0.752     0.149
                vase     5e+03       353    0.0988     0.841     0.589     0.177
            scissors     5e+03        56    0.0552     0.732     0.438     0.103
          teddy bear     5e+03       245     0.156     0.853     0.671     0.264
          hair drier     5e+03        11    0.0488     0.182     0.152    0.0769
          toothbrush     5e+03        77     0.047     0.727     0.334    0.0883

loading annotations into memory...
Done (t=5.42s)
creating index...
index created!
Loading and preparing results...
DONE (t=2.93s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=43.42s).
Accumulating evaluation results...
DONE (t=5.81s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.366
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.607
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.386
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.207
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.391
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.296
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.464
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.331
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.517
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.618
yanlongbinluck commented 5 years ago

@glenn-jocher , I have got this results with official weights, thank you very much. I thought that the results in readme was that you trained from scratch. I misunderstood. haha If you achieve the darknet results with your train.py from the scratch, I think your repo is the best one in the github pytorch yolov3. So far I have not found any pytorch yolov3 repo to start training from scratch in coco and achieve official results. come on,guy. By the way, your code is intuitive and easy to understand, not too nested, very good.

glenn-jocher commented 5 years ago

@yanlongbinluck yes, training is the final frontier. Detection is verified working correctly (on images, video, webcam and iPhone app), testing is verified working correctly now, all that is left is training.

We use the exact darknet function for loss, the only differences are:

  1. We use CE for cls instead of BCE because this produces better results.
  2. We weight the loss terms seperately, whereas darknet appears to use the same weighting for all.