Open nirbenz opened 6 years ago
how to mesure the map?
Using the COCO API.
@nirbenz could you please help me to confirm some confusion?
@nirbenz here is the result from the paper.
you can see the ap is 33 and ap-50 is 57.9 (608x608). if your result is 31 and 55 at same resolutions, it seems good.
2% difference in mAP is rather large, and I was wondering if this is an issue with the Keras implementation (vs original Darknet); i.e., Keras loses accuracy compared to an identical Darknet-based model. Since this is the original model from the paper, losing 2% is rather strange!
Are you running the default setup which is 416x416 or did you modify the setup to run 608x608 and how many epochs did you let the model train for?
It is certain that the inference result of pretrained model on a given image is the same as Darknet.
For clarification, I am using pjreddie's converted model (darknet to keras) and have yet to train a model myself. The results I wrote above are for 608/416/320 resolutions, respectively.
qqwwee - I am using your inference code as-is, so I find this surprising. It is possible that even the original model (darknet one) doesn't achieve the same results as in the paper?
Thanks!
@nirbenz Are you running on COCO test2017? My result for 416 resolutions on test2017 is:
overall performance
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.271
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.457
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.290
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.284
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.236
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.318
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.321
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.124
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.483
mAP50=0.457 is also lower than 55.3 mentioned in paper. I am trying to run on 608 resolutions
It makes no sense to test on the 2017 test set, since the original paper/model uses the circa-2014 train/val split (in which you join the 2014 test+val datasets and take a random 5k subset for evaluation). Using the 5k subset from the original paper, I get this for 416x416:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.299
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.306
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.449
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.261
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.385
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.179
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.418
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.562
I'm facing the same problem. I'm working with a custom dataset with ~32k images and after conversion the performance(mAP) is way lower than the original model and even after fine-training the converted Keras model I can't reach the original precision by far. It is conspicuous, that certain objects can still be found precisely and others aren't found at all.
@AlphaRalph To be clear, you trained your custom dataset using both the original darknet implementation and this one, and found tihs one has a lower mAP?
I trained in original Darknet with a good mAP and after conversion (no further training in Keras) the performance was significantly lower. Certain classes of objects haven't been found at all after the conversion.
@AlphaRalph What you mean is, the same image in darknet inference and in keras inference has a totally different result. I think it is a big problem. Could you explain more about the details?
Yes, your definitely right! What a pity, this project/ repo is so great but the inference performance in keras can't keep up with the original darknet. Meanwhile I tried a few different approaches, like fine-training the yolo.h5 without freezing the conv layers, but after two epochs the loss started increasing heavily. Fine-training the yolo.h5 with freezing didn't work out as well, since the loss didn't really decrease any further. I fear it may has something to do with this special trick they are doing in darknet, when they're splitting up the image in a 13x13 pattern.
If you’re training from scratch, how are the layers initialized? Having them initialized wrong will cause the gradient to diverge rapidly. I think Darknet also included some rather specific elements in the training and inference for layer normalization. I didn’t go back through the Keras model to see if all those elements were in place.
Those differences would absolutely cause the given model to behave differently when training.
On Jun 5, 2018, at 8:09 AM, AlphaRalph notifications@github.com wrote:
Yes, your definitely right! What a pity, this project/ repo is so great but the inference performance in keras can't keep up with the original darknet. Meanwhile I tried a few different approaches, like fine-training the yolo.h5 without freezing the conv layers, but after two epochs the loss started increasing heavily. Fine-training the yolo.h5 with freezing didn't work out as well, since the loss didn't really decrease any further. I fear it may has something to do with this special trick they are doing in darknet, when they're splitting up the image in a 13x13 pattern.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwweee/keras-yolo3/issues/35#issuecomment-394685240, or mute the thread https://github.com/notifications/unsubscribe-auth/AlwY3X9UH9Z7094wQjcXlHLb2eJ-1ScNks5t5nTmgaJpZM4Tz-bq.
Among the pipeline preprocessing -> network calculation -> postprocessing, I found prerocessing differs most. In detail, Darknet resize and PIL resize, 0.5 and 128/255. The difference is small. But the undesirable truth is a lower mAP.
And there's also the issue of letterboxing, which in Darknet happens under the hood. Those differences can be eliminated by running YOLOv3 over darknet's original code and by running most of the preprocessing code in Python beforehand. I'll try that and report.
@nirbenz ,have you finished your test?could you please report your result?
Nope, I actually haven't gotten around to it. I can confirm that I am getting comparable results on both framework implementations though (darknet and Keras for YOLOv3). Keras is still a bit lower but I since it's a non-native implementation I tend to be forgiving (although it'd be wonderful if no differences would appear at all, as I have experienced when converting Caffe models to MXNet for instance). From my experience, fine-tuning on the target framework usually eliminates all differences. Not all frameworks share under the hood implementations and this can sometime causes differences. Haven't tried performing the preprocessing in Python for Darknet though as it proved slightly less straightforward than I thought.
Did anyone else get around the same results as I did for COCO-17 test set?
@nirbenz what kind of split are you using for train / test in the table you reported above?
I'm facing the same problem. I'm working with a custom dataset with ~32k images and after conversion the performance(mAP) is way lower than the original model and even after fine-training the converted Keras model I can't reach the original precision by far. It is conspicuous, that certain objects can still be found precisely and others aren't found at all.
How is your result? I made a test script based on the yolo.detect_image to generate the JSON file and test on COCO val2017 using cocoapi. The mAP is lower than 0.1! I suspect the problem is from the dataset choosing or the test script.
https://github.com/YunYang1994/tensorflow-yolov3 hope it helps you
https://github.com/YunYang1994/tensorflow-yolov3 hope it helps you
your project has the same mAP with the paper?
@sanmianjiao @YunYang1994 @707346129 @katerynaCh this YOLOv3 tutorial may help you: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data
The accompanying repository works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights
(60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :)
https://github.com/ultralytics/yolov3
|
---|
Hi guys!
I am currently getting 31/30/27 (mAP-50) and 55/53/49.5 (mAP) with this implementation, which is a bit lower than what the paper claims. I was wondering if anyone else experienced this and might have some intuition w.r.t. what's causing the drop?
Thanks! Nir