weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.67k forks source link

about SSD's data augmentation #243

Open quietsmile opened 8 years ago

quietsmile commented 8 years ago

Hi, weiliu, First of all, thanks so much for the great SSD work. In your eccv's presentation, you mentioned that the mAP of SSD 512 could be improve from 77% to 80% by simply adding some random expansion. Could you release that version that can reproduce 80% mAP result? And second question, why does SSD need so many data augmentations than Faster-RCNN? Thanks very much.

quietsmile commented 8 years ago

For the second question, I find in your paper that faster-RCNN's ROI-Pooling might be more robust. But I personally think that RPN (without post classification) will not be improved a lot by the data augmentation used in SSD. And why do we need to change the ratio of the image? Why not only training in the multi-scale way (keep the ratio fixed) ? Thanks.

weiliu89 commented 8 years ago

You mean RPN will be improved by data augmentation? It is likely. But maybe the Faster R-CNN authors haven't tried many data augmentation (I think they have used multi scale during training). It would be interesting to see what is Faster R-CNN's result if it does same augmentation trick as SSD. Both YOLO and SSD needs a lot of data augmentation to train the detector because there is no ROI pooling or image cropping as was done in the R-CNN series of methods, which I think is very helpful for classification part. I think SSD's default box tiling is better than RPN's anchor box tiling, as was demonstrated in experiment as well, and I believe SSD's localization (bbox) is better, but maybe it is more difficult for SSD to classify object right. Maybe with better tiling of default box so that it has a better coverage of the ground truth objects, or make the default box better align with the receptive field of each neuron would probably help increase the performance. It is some open research questions.

Changing the aspect ratio can easy the training a bit and also makes training faster (since I can do batch processing). I have a version to keep aspect ratio, it is slightly better (1-2%) than the wrapped one, but is 1-2x slower. I will consider releasing that bit of codes later.