weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.67k forks source link

SSD TensorFlow - reproducing mAP #456

Open balancap opened 7 years ago

balancap commented 7 years ago

Hello,

I have been implementing your SSD paper to TensorFlow (if you want to have a look: https://github.com/balancap/SSD-Tensorflow). Before playing a bit around with different architecture, I am trying to reproduce your performances on Pascal VOC 2007. I have a few questions concerning the post-processing evaluation. I tried to understand the pipeline from your code, but I am not completely sure about it (I used weights from http://www.cs.unc.edu/~wliu/projects/SSD/models_VGGNet_VOC0712_SSD_300x300_ft.tar.gz):

  1. For anchors sizes, I guess you use the ratio and sizes described in the file score_ssd_pascal_ft.py?
  2. How do you resize the image for evaluation? Do you keep the ratio h/w and pad or distort the image to 300x300?
  3. If I understand the pipeline, it works as follows : select top_k detections, apply NMS and keep keep_top_k. How is the variable confidence_threshold used in the evaluation? Do you filter out some boxes after NMS based on their scores?
  4. Do you clip bounding boxes to the image size?
  5. If I am right, you're using the Pascal07 mAP algorithm?

I am sorry if it sounds a bit like a long enquiry! I am not too far away from reproducing your mAP score (I got around 0.76), but I think it would be great to have a TensorFlow implementation which is up to your performances!

Thanks for your help! Paul

weiliu89 commented 7 years ago

Thanks Paul for porting it to TF.

  1. Yes.
  2. I distort the image to 300x300. There is a WARP option in the resize_param.
  3. confidence_threshold (0.01) is used to filter out most of detections before select top_k detections for NMS.
  4. No.
  5. Yes. The 11point algorithm in the bbox_util.cpp

0.76 is decent. I think this specific model should give 0.81 mAP.

balancap commented 7 years ago

Thanks for your help ! Yes, I am trying to obtain 0.81 mAP, which is the score I got testing directly your code. It may be a bit different at the end, but I hope to be around 0.8.

One last question : I have seen that you're using the Fast NMS implementation. What is the default value of the eta parameter you're using? I could not find it in the python script. Thanks again!

weiliu89 commented 7 years ago

Check out here. No adaptive by default.

balancap commented 7 years ago

Cool, thanks again :) So if without any adaptive, it is completely equivalent to the original NMS algorithm I guess. The mAP I got after fixing a few bugs is around 0.77. Still need to figure out why it is not closer to 0.8. Is there a way with your implementation to get the full recall-precision curve?

weiliu89 commented 7 years ago

You can get this information out or plot it out.

balancap commented 7 years ago

I just noticed that you are calculating the AP per class, and then averaging over all the classes (https://github.com/weiliu89/caffe/blob/ssd/src/caffe/solver.cpp#L519-L547). I was computing a very crude AP on all classes mixed ! I'll implement your approach and see if I get closer results :)

balancap commented 7 years ago

Hello,

Finally managed to reproduce your results ! I had a few bugs remaining in my L2-normalization layer and the padding correction in TF. Everything seems consistent now :)

weiliu89 commented 7 years ago

@balancap Thanks Paul for the wonderful work! Is the TF code able to train and get similar results? Or does it only support evaluating a converted Caffe model?

balancap commented 7 years ago

I have a basic training script on Pascal VOC datasets, but it is not working as well as yours for now ! I need in particular to improve the data augmentation parts which seems quite crucial.

weiliu89 commented 7 years ago

I see. Thanks! I think most of the augmentation is already implemented in TF.

balancap commented 7 years ago

Yes, and I am using most of them ! I got some decent results when I played a bit with the KITTI dataset.

I need to look at a bit closer as despite the hard negative mining, I had still too much false positives at the end. It may have to do with the scaling and cropping parameters the data augmentation.

ankghost0912 commented 7 years ago

Hi there,

I'm trying to dig into the source code for the caffe ssd. @balancap mentioned about score_ssd_pascal_ft.py. I searched the repository and couldn't find that file. Has the file been moved somewhere or is it merged with the score_ssd_*.py files?

Thanks.

albanie commented 7 years ago

score_ssd_pascal_ft.py is included in the fine-tuned model archives, rather than in the main code repository.