Run SSD on HD image for people detection - Githubissues

weiliu89 / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

4.77k stars 1.67k forks source link

Run SSD on HD image for people detection #168

Open squidszyd opened 8 years ago

squidszyd commented 8 years ago

Hi, I'm recently working on detecting relatively small && one-class objects (like 120 x 80) in huge images (3840 x 2160). Should I modify the default boxes's aspect ratio to adapt to the object that I want to detect? Or should I divide the image into small pieces to enlarge the relative-size of objects?

Josca commented 8 years ago

I am also interested in detection of relatively small objects.

aurotripathy commented 8 years ago

@weiliu89 Could you please give us your thoughts on this question about one-class detection. Let's say: (1) We have the bounding boxes for a single class in the training set. (2) They are simple geometries (rectangles) but oriented every which way (the object, not he bounding box) (3) Dimension could be as low as 15x15 pixels in a 400x400 image.

Can SSD work as-is if we have one-class (in this case one, i.e., presence/absence of class)?

ronnie-tian commented 8 years ago

as indicated in the paper, you may want to try smaller default box and scale as indicated in Section 3.4. But i think if the object is too small, it may still not able to detect. Maybe using the even lower conv layers could get better performance for smaller objects, but the default box number could increase dramatically which may have problem while training.

aurotripathy commented 8 years ago

@shawn-tian , Thank you. That is very helpful. Are you willing to give your thoughts on the one-class detection question I raised? I need to train and detect for the presence and location of only one class of objects (not 20 classes as is the case of Pascal VOC)

squidszyd commented 8 years ago

@aurotripathy I'm also working on detecting one-class object. What I did is just crop all kinds of orientation of objects and assign them to the same class ID. This introduces inner-class diversity which I think is feasible for training. After that, set the class num to 2 (1 for background) and start training the net. I've got the same issue for detecting small objects. The net is hard to converge and the loss is always greater than 1.

aurotripathy commented 8 years ago

@squidszyd Thanks. In your training dataset, did you have labeled data for both the classes (object and background)?

squidszyd commented 8 years ago

@aurotripathy No. But one approach is to give some hard negative background a new class ID and train them as if they are "objects"(I did this when training Faster RCNN). Or you may have to modify the code in the "default box generating layer" to mine background samples in the way you want.

aurotripathy commented 8 years ago

You make an important point but I'm not grasping it. This is the first time I'm doing a detection application. Can you please elaborate on your approach to "to give some hard negative background a new class ID and train them as if they are "objects"". Can this approach be applied to SDD as well?

I think you are referring to the approach here.

Is background another 'implicit' class for SSD as well?

squidszyd commented 8 years ago

The class num is set to 21 for detecting 20 classes of objects in VOC2007 which means there is a class ID for background. You may check this file and will see that the ID for background is set to 0(line 3). By saying "to give some hard ... as if they are 'objects' ", I mean to add a new class ID for 'hard negative examples‘. 'Hard negative examples' are samples extracted from background but they are less likely to distinguish (like false alarms). For example, in my case, background.id = 0, people.id = 1 , hard-negeative-examples.id = 2 As you can see, 'hard negative examples' and 'background' do not have the same class ID but they are both backgrounds. At testing time, you just have to output the bounding boxes of 'people' class and ignore those of 'hard negative examples'. But I only did this when training FasterR-CNN, I think it is also feasible for SSD.

aurotripathy commented 8 years ago

@squidszyd Thank you for explaining. I'm very new to SSD (and detection) and still learning.