Closed ydixon closed 5 years ago
Hmm, I'm not familiar with pycocotools. Is that the official COCO mAP code? Inference is about identical between this repo and darknet (training differences abound though...), so mAP on the official weights should also be the same, though test.py computes mAP slightly differently than the official COCO code.
I noticed your local version is a bit out of date with the current repo. The current test.py conf_thres
is 0.30, which shows improved results compared to the 0.50 you are using. 0.20 works better also BTW, I'm not sure exactly the perfect sweetspot, you could tune this if you have time.
https://github.com/ultralytics/yolov3/blob/2dd2564c4e1ed9c03c8b59a921b1c0db2124be15/test.py#L136
Yeah, running https://github.com/cocodataset/cocoapi. So I didn't touch anything. And I've already tested conf_thres
with 0.001, 0.005, 0.05, 0.4, 0.5. I gonna try 0.30 or 0.20 later as you suggested, but I doubt it's gonna make huge impact on the score.
Ah it sounds like you tried several values. I think < 0.10 is too low, and > 0.30 is too high. You should get a pretty big improvement going from 0.5 to 0.3, perhaps 10% better mAP (i.e. from 0.40 to 0.50 mAP).
I modified eval_map.py,datasets.py to adapt more recent style of the repo.
Here are the results. The reason why I would try thresh
less than 0.10
is because when we build the precision-recall curve, we could include all probability thresholds, starting from 0 score.
Thresh:0.001 mAP@0.5:0.388
Thresh:0.005 mAP@0.5:0.376
Thresh:0.05 mAP@0.5:0.425
Thresh:0.3 mAP@0.5:0.423
Thresh:0.4 mAP@0.5: 0.411
Thresh:0.5 mAP@0.5: 0.398
Console log:
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.001
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=3.10s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=87.17s).
Accumulating evaluation results...
DONE (t=8.16s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.187
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.338
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.185
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.044
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.165
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.229
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.368
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.418
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.182
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.525
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.005
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.005, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.45s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=55.59s).
Accumulating evaluation results...
DONE (t=5.09s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.206
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.202
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.184
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.356
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.239
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.407
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.172
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.516
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.05
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.05, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.42s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=26.84s).
Accumulating evaluation results...
DONE (t=2.96s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.232
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.425
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.227
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.058
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.214
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.374
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.243
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.355
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.133
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.361
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.487
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.3
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.3, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.16s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=16.91s).
Accumulating evaluation results...
DONE (t=2.35s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.238
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.241
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.058
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.363
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.230
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.312
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.315
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.089
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.292
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.438
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.4
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.4, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.15s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=15.75s).
Accumulating evaluation results...
DONE (t=2.28s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.235
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.411
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.241
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.357
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.225
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.299
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.302
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.079
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.426
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3# python eval_map.py --weights weights/yolov3.weights --conf-thres 0.5
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.5, data_config='cfg/coco.data', img_size=416, iou_thres=0.5, n_cpus=0, nms_thres=0.45, weights='weights/yolov3.weights')
Using device: "cuda:0"
Compute mAP...
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.13s)
creating index...
index created!
Images: 5000
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=14.42s).
Accumulating evaluation results...
DONE (t=2.13s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.231
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.398
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.239
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.206
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.352
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.220
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.289
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.068
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.261
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
(fastai) root@4c990753b224:/deep_learning/ultralytics-yolov3#
Hmmm, well then I don't understand the discrepancy. The last official COCO SDK results from test.py were by @nirbenz in https://github.com/ultralytics/yolov3/issues/2#issuecomment-434751531, showing 0.543 mAP@0.5 at 416 pixels. Nothing significant should have changed for inference in the repo since then. I'm not sure what to say, other than to try to ask @nirbenz for a PR for his SDK export code.
Recent results from detect.py also look the exact same as darknet's, i.e. https://github.com/ultralytics/yolov3/issues/16#issuecomment-449569166
Understood. I will continue testing and see if I did something wrong with eval_map. I'll let you know if I find something.
Understood. I will continue testing and see if I did something wrong with eval_map. I'll let you know if I find something.
hi,i see your repo map is 0.547 use pycocotools, can you tell me how to solve this? I met the same issue.
It looks like it would be beneficial for test.py to output a JSON file in the format that https://github.com/cocodataset/cocoapi wants, so we could generate mAP directly from cocoapi. I think the relevant JSON format is here. Do any of you have code ready-made for a PR that already does this? https://github.com/cocodataset/cocoapi/blob/master/results/instances_val2014_fakebbox100_results.json
Understood. I will continue testing and see if I did something wrong with eval_map. I'll let you know if I find something.
hi,i see your repo map is 0.547 use pycocotools, can you tell me how to solve this? I met the same issue.
The NMS scheme in original darknet repo is different from this repo, I suggest you can take a look at the nms code where they differs. Awhile back I did try to make those change on this repo, and I was able to push the mAP to 0.49. But then I got dragged to work on something else.
It looks like it would be beneficial for test.py to output a JSON file in the format that https://github.com/cocodataset/cocoapi wants, so we could generate mAP directly from cocoapi. I think the relevant JSON format is here. Do any of you have code ready-made for a PR that already does this? https://github.com/cocodataset/cocoapi/blob/master/results/instances_val2014_fakebbox100_results.json
I could make a simple PR, the code is pretty straightforward as shown in eval_map.py above. However, you will still need to generate the ground truth json for the 5k dataset as well as any other custom dataset if you want to use COCO api properly. I don't know in which way you would want it to be included in the code.
@ydixon Actually, you could write the imgsIDs from "5k.txt" into a list and use that to filter ground truth labels in default cocoeval code. I could upload it if anyone needs.
@ydixon @okanlv @AndOneDay, I updated test.py
with a --save-json
argument, which outputs a COCO json and evaluates it using pycocotools
. There are a few adjustments to the data going into the json:
xywh
, but xy
is top left corner, not centered. Image origin is top left.coco80_to_coco91_class()
.Code to compile the json dict: https://github.com/ultralytics/yolov3/blob/eb6a4b5b84f177697693a4de4e98ca4c2539cc11/test.py#L67-L81
Code to evaluate the json with pycocotools: https://github.com/ultralytics/yolov3/blob/eb6a4b5b84f177697693a4de4e98ca4c2539cc11/test.py#L141-L157
Output mAP is low using yolov3.weights
though, so it may not be constructing the json correctly, or the test.py
hyperparameters may not be properly aligned with darknet.
sudo rm -rf yolov3 && git clone https://github.com/ultralytics/yolov3
sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
cd yolov3 && python3 test.py --save-json
...
5000 5000 0.633 0.598 0.589
Image Total P R mAP
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.271
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.460
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.285
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.106
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.295
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.236
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.317
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.320
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.492
In the nms function, comment and replace this line
# v = ((pred[:, 4] > conf_thres) & (class_prob > .4)) # TODO examine arbitrary 0.4 thres here
v = (pred[:, 4] * class_prob > conf_thres)
Run the test with with 0.005 conf_thresh
, you might want to rename it to something else I think.
DONE (t=4.46s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.308
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.549
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.339
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.447
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.266
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.396
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.452
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.570
Please also let me know how is your model performing under COCO api. There's some interesting different design choices in this repo compared to the original darknet and I would really like to know how well they do under such changes.
@ydixon Actually, you could write the imgsIDs from "5k.txt" into a list and use that to filter ground truth labels in default cocoeval code. I could upload it if anyone needs.
Oh I thought the cocoDt.imgIds
will automatically select the overlapped set. When it didn't work as expected, I ended up creating gt annotations itself. :D
@ydixon @okanlv @AndOneDay it worked, pycocotools mAP is 0.550 (416) and 0.579 (608) with yolov3.weights
in the latest commit!! I simply applied the changes @ydixon recommended. Unfortunately performance swapped between pycocotools mAP and our own in-house mAP, which shows about 0.40 mAP now, will investigate, and also run on our scratch-trained model.
sudo rm -rf yolov3 && git clone https://github.com/ultralytics/yolov3
sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
cd yolov3
...
python3 test.py --save-json --conf-thres 0.005
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.308
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.550
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.339
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.448
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.266
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.398
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.417
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.226
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.456
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.572
...
python3 test.py --save-json --conf-thres 0.005 --img-size 608 --batch-size 16
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.328
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.579
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.341
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.425
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.279
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.423
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.444
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.293
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.472
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.557
@glenn-jocher Thanks! Now I'm more incentivized to try the unique anchor loss layer approach. (GPU resource is expensive!)
@ydixon yeah of course! I'm left disenfranchised by the mAP metric now. The lower I set conf_thres
the better the test.py results. If I set conf_thres = 0.001
then pycocotools
mAP rises to 0.554 at 416. But real world results require higher thresholds, around 0.5. So mAP appears to be a bad metric for real-world usability. Anyway, yes it is great to finally be able to output official pycocotools
results directly!
python3 detect.py --conf_thres 0.005
@ydixon yeah of course! I'm left disenfranchised by the mAP metric now. The lower I set
conf_thres
the better the test.py results. If I setconf_thres = 0.001
thenpycocotools
mAP rises to 0.554 at 416. But real world results require higher thresholds, around 0.5. So mAP appears to be a bad metric for real-world usability. Anyway, yes it is great to finally be able to output officialpycocotools
results directly!
python3 detect.py --conf_thres 0.005
Have you found the reason of this beahvior? The intuition is that the lower the confidence threshold, the higher the false positives, so the precision would be lower.. so it's a little confusing.
@simaiden the original problems referenced in this issue have been corrected. mAP is correctly reported now, along with P and R.
@simaiden the original problems referenced in this issue have been corrected. mAP is correctly reported now, along with P and R.
But what about the lower map with high confidence threshold? This happen when I use the coco api, do you mean that with this repo didn't happen?
Thanks!
@simaiden I don't understand. mAP should be computed at 0.001 or 0.0001 confidence threshold. Everything is working correctly in this repo in regards to mAP computation.
@simaiden I don't understand. mAP should be computed at 0.001 or 0.0001 confidence threshold. Everything is working correctly in this repo in regards to mAP computation.
Thanks for your reply.
Ok, this is the way to calculate the map, but do you have any clue why this? and why when I increase the confidence the map decrease? I'm not talking about this repo in particular but in general, sorry if my question is not about the repo itselfs.
@simaiden search online, we can't help you with this.
@glenn-jocher Apologize for bringing up the same topic again. I've noticed there's lot of threads about mAP and I have read them, but none of them has code to test. So I modified a little bit of your detect/test code to run pycocotools.
Added
load_images_v2
in datasets.py . Run eval_map.py and it is compared against ground truth file coco_valid.json. The ground truth file has been test against results generated from darknet repo with matching mAP. If you want to generate it yourself, you can go here.I am only running the model with official yolov3 weights. Any ideas on improving the score?