longcw / yolo2-pytorch

YOLOv2 in PyTorch
1.54k stars 421 forks source link

mAP #1

Open GOATmessi8 opened 7 years ago

GOATmessi8 commented 7 years ago

Have you ever evaluate the transformed trained model in VOC2007? I've tried your code and got a 71.9 mAP while the original is 76.8. Then I found a tiny error in test code, after fixing the result up to 72.8 mAP, still not enough...

longcw commented 7 years ago

Yes, I got the same result. You can make a pull request for me to fix the bug. I have no idea about the low mAP of my implementation. Did you try darknet implemented by the author?

GOATmessi8 commented 7 years ago

Not yet, but I found an issue in darkflow, it seems the transfer to tensorflow also cause some difference. https://github.com/thtrieu/darkflow/issues/25 I will make a pull request if I figure out the training part. Maybe that could solve the problem...

longcw commented 7 years ago

I implemented the loss function following the darknet and the training process is work now. I trained it on VOC2007 trainval set and got a ~71.86 mAP~ ~50mAP on the test set. Maybe you can find out some other problems about the low mAP with the help of darknet source code.

terrychenism commented 7 years ago

@longcw Thank you for sharing code. I have tested the converted darknet model, which got ~72 mAP. Then I trained VOC07 trainval set for 160 epoch (totally use your github codes), which only got ~50 mAP. Did you successfully train the yolo2 detector?

longcw commented 7 years ago

Thank you for your comment. I tested the trained model and got the same result, ~50mAP. There are still some bugs for training. I am sorry for this.

crazylyf commented 7 years ago

For test phase, there are two parameters inconsistent with the original darknet:

As @ruinmessi , before correcting those parameters, the mAP in VOC2007-test is 71.9. Correction of first parameter improves slightly to 72.2, and correction the iou_thresh further boosts to 73.6. The tensorflow version of yolo (darkflow) seems to suffer such a problem too, and an issue of that project pointed out some possible reasons. Maybe the reasons exist also in this project?

crazylyf commented 7 years ago

@ruinmessi What error in test code have you fixed?

GOATmessi8 commented 7 years ago

@longcw @crazylyf Sorry for leaving a long time. I boost the mAP to 74.3 by changing the nms order like this while this project do the nms in a function called postprocess. with the exact parameters you mentioned.

crazylyf commented 7 years ago

Why your mAP is 0.7 higher if we are using the same parameters? Am I missing something?

GOATmessi8 commented 7 years ago

The nms should implement before thresh holding.

longcw commented 7 years ago

@ruinmessi Thank you for pointing out this problem.

GOATmessi8 commented 7 years ago

@longcw I am curious about how to convert the original weights to h5 file, could you please show me some details or scripts?

longcw commented 7 years ago

@ruinmessi I use darkflow to load original weights from the binary weights file.

rdfong commented 7 years ago

Is there any update on the training issue?

jxgu1016 commented 7 years ago

@ruinmessi Does the order of NMS and thresh holding affect the results? I don't think so..Can anyone prove I am wrong?

rdfong commented 7 years ago

Perhaps the weights of the convolutional layers needs to be held fixed while training on the VOC datasets?

rdfong commented 7 years ago

In darknet19_448.cfg from the darknet project, batch size is 128, not 16 as it is in the config files here. Unfortunately I do not have the resources to test with a full batch size of 128. With 16 though I can confirm that I only get ~50 mAP. Can someone else try to confirm whether or not changing the batch size makes a difference? It's the only parameter I can find that differs between the two projects.

cory8249 commented 7 years ago

I slightly change this code (following original YOLO training procedure), and train 160 epoch on VOC07+12, test on VOC07-test, evaluated mAP with 416 x 416 resolution 0.6334, batch size 16 (trained by me) 0.6446, batch size 32 (trained by me)

0.7221, batch size 64 (directly test by using the weight provided by @longcw (yolo-voc.weights.h5) 0.768 , batch size 64 (claimed by paper, not trained by me)

Revise this code seems necessary if you want to train with such large batch size (64) It need to work on multi-GPU. ( split a large batch to smaller to fit into single GPU memory)

I think there is still something mismatched, so mAP drops largely.

JesseYang commented 7 years ago

I have implemented YOLOv2 in tensorflow. But I can achieve an mAP of about only 0.60 on VOC07-test (train with VOC07+12 train+val), with all the tricks except "hi-res detector" in Table 2 in the paper implemented. @cory8249 Could you kindly share your code which achieves 0.768 mAP? Thanks!!

cory8249 commented 7 years ago

@JesseYang Sorry to let you misunderstand, 0.768 mAP is not trained by me. I just mention it as reference.

JesseYang commented 7 years ago

@cory8249 I see. Thanks!

cory8249 commented 7 years ago

I fix the IoU bug, and train on VOC0712 trainval. Get mAP = 0.6825 (still increase slowly) https://github.com/cory8249/yolo2-pytorch/blob/master/darknet.py#L120

JesseYang commented 7 years ago

@cory8249 Have you fixed another issue when you got the 0.6825 mAP?

cory8249 commented 7 years ago

@JesseYang I think I've fix these exp() sig() bug in my experiment.

cory8249 commented 7 years ago

I also found something interesting: ver.A = pytorch anaconda prebuild version (cp36) ver.B = pytorch built from source code using native python (python35) In training phase ver.A is 2x slower than ver.B (1 sec/batch vs. 0.5 sec/batch) In test phase ver.A is 1.5x slower than ver.B (16 ms/img vs. 11ms/img)

Does anyone have this same problem ?

cory8249 commented 7 years ago

I've trained a model with mAP = 0.71 by fixing bug in #23

gauss-clb commented 6 years ago

Does anyone try to train yolov1 on pascal voc(2007+2012 trainval) and surpass mAP by 60% on 2007 test?

xuzijian commented 6 years ago

After modified the code mentioned here, my mAP goes to 72.1% with 416*416 input.

wahrheit-git commented 6 years ago

@xuzijian what mAP do you get with VOC(2007 trainval) after the changes?

xuzijian commented 6 years ago

@kk1153 I haven't trained models with only VOC07 dataset

Liu0329 commented 6 years ago

@cory8249 @xuzijian @JesseYang, I use the latest master code on 07+12trainval of batchsize=32 on pytorch 0.4, and got the mAP=0.663. But when I test the yolo-voc.weights.h5, the mAP=0.677, which is much worse than the mAP=0.722 mentioned above. Did I miss something ? While this topic has been discussed for long, can anyone provide a good result with a clear repo to follow ? Thanks !

DW1HH commented 5 years ago

@Liu0329 me too