Open GOATmessi8 opened 7 years ago
Yes, I got the same result. You can make a pull request for me to fix the bug. I have no idea about the low mAP of my implementation. Did you try darknet implemented by the author?
Not yet, but I found an issue in darkflow, it seems the transfer to tensorflow also cause some difference. https://github.com/thtrieu/darkflow/issues/25 I will make a pull request if I figure out the training part. Maybe that could solve the problem...
I implemented the loss function following the darknet and the training process is work now. I trained it on VOC2007 trainval set and got a ~71.86 mAP~ ~50mAP on the test set. Maybe you can find out some other problems about the low mAP with the help of darknet source code.
@longcw Thank you for sharing code. I have tested the converted darknet model, which got ~72 mAP. Then I trained VOC07 trainval set for 160 epoch (totally use your github codes), which only got ~50 mAP. Did you successfully train the yolo2 detector?
Thank you for your comment. I tested the trained model and got the same result, ~50mAP. There are still some bugs for training. I am sorry for this.
For test phase, there are two parameters inconsistent with the original darknet:
thresh
parameter for bbox filtering is 0.001 in darknet, while it is 0.01 in test.py
;iou_thresh
for nms is 0.5 in darknet, while it is 0.3 in this project.
For train phase, the thresh
in cfgs/config.py
should be 0.24, instead of 0.3;As @ruinmessi , before correcting those parameters, the mAP in VOC2007-test is 71.9. Correction of first parameter improves slightly to 72.2, and correction the iou_thresh
further boosts to 73.6.
The tensorflow version of yolo (darkflow) seems to suffer such a problem too, and an issue of that project pointed out some possible reasons. Maybe the reasons exist also in this project?
@ruinmessi What error in test code have you fixed?
@longcw @crazylyf Sorry for leaving a long time. I boost the mAP to 74.3 by changing the nms order like this while this project do the nms in a function called postprocess. with the exact parameters you mentioned.
Why your mAP is 0.7 higher if we are using the same parameters? Am I missing something?
The nms should implement before thresh holding.
@ruinmessi Thank you for pointing out this problem.
@longcw I am curious about how to convert the original weights to h5 file, could you please show me some details or scripts?
@ruinmessi I use darkflow to load original weights from the binary weights file.
Is there any update on the training issue?
@ruinmessi Does the order of NMS and thresh holding affect the results? I don't think so..Can anyone prove I am wrong?
Perhaps the weights of the convolutional layers needs to be held fixed while training on the VOC datasets?
In darknet19_448.cfg from the darknet project, batch size is 128, not 16 as it is in the config files here. Unfortunately I do not have the resources to test with a full batch size of 128. With 16 though I can confirm that I only get ~50 mAP. Can someone else try to confirm whether or not changing the batch size makes a difference? It's the only parameter I can find that differs between the two projects.
I slightly change this code (following original YOLO training procedure), and train 160 epoch on VOC07+12, test on VOC07-test, evaluated mAP with 416 x 416 resolution 0.6334, batch size 16 (trained by me) 0.6446, batch size 32 (trained by me)
0.7221, batch size 64 (directly test by using the weight provided by @longcw (yolo-voc.weights.h5) 0.768 , batch size 64 (claimed by paper, not trained by me)
Revise this code seems necessary if you want to train with such large batch size (64) It need to work on multi-GPU. ( split a large batch to smaller to fit into single GPU memory)
I think there is still something mismatched, so mAP drops largely.
I have implemented YOLOv2 in tensorflow. But I can achieve an mAP of about only 0.60 on VOC07-test (train with VOC07+12 train+val), with all the tricks except "hi-res detector" in Table 2 in the paper implemented. @cory8249 Could you kindly share your code which achieves 0.768 mAP? Thanks!!
@JesseYang Sorry to let you misunderstand, 0.768 mAP is not trained by me. I just mention it as reference.
@cory8249 I see. Thanks!
I fix the IoU bug, and train on VOC0712 trainval. Get mAP = 0.6825 (still increase slowly) https://github.com/cory8249/yolo2-pytorch/blob/master/darknet.py#L120
@cory8249 Have you fixed another issue when you got the 0.6825 mAP?
@JesseYang I think I've fix these exp() sig() bug in my experiment.
I also found something interesting: ver.A = pytorch anaconda prebuild version (cp36) ver.B = pytorch built from source code using native python (python35) In training phase ver.A is 2x slower than ver.B (1 sec/batch vs. 0.5 sec/batch) In test phase ver.A is 1.5x slower than ver.B (16 ms/img vs. 11ms/img)
Does anyone have this same problem ?
I've trained a model with mAP = 0.71 by fixing bug in #23
Does anyone try to train yolov1 on pascal voc(2007+2012 trainval) and surpass mAP by 60% on 2007 test?
After modified the code mentioned here, my mAP goes to 72.1% with 416*416 input.
@xuzijian what mAP do you get with VOC(2007 trainval) after the changes?
@kk1153 I haven't trained models with only VOC07 dataset
@cory8249 @xuzijian @JesseYang, I use the latest master code on 07+12trainval of batchsize=32 on pytorch 0.4, and got the mAP=0.663. But when I test the yolo-voc.weights.h5, the mAP=0.677, which is much worse than the mAP=0.722 mentioned above. Did I miss something ? While this topic has been discussed for long, can anyone provide a good result with a clear repo to follow ? Thanks !
@Liu0329 me too
Have you ever evaluate the transformed trained model in VOC2007? I've tried your code and got a 71.9 mAP while the original is 76.8. Then I found a tiny error in test code, after fixing the result up to 72.8 mAP, still not enough...