Open hunterlew opened 7 years ago
Object detection is a little more complicated than object classification to set up. As a result it's going to require a bit more coding than I can list in a single response, but here are some hints that hopefully help to get you started :)
There are a couple of approaches.
The first way is to transform your dataset to match the folder structure of the Pascal VOC dataset. This is useful for getting something up and running in a short space of time. I did this once to try out something quickly for face detection - as an example, there's a demo script showing how you can go about converting the wider face dataset into the Pascal layout here (you would need to change this to match your own data layout).
The second approach is to write a wrapper to generate the imdb
structure for your new dataset. This means writing your own version of the imdb setup scripts, using the original as a starting point.
The first approach is easier (although it is a bit of a hack). The second is a little more work, but can be quite useful as a learning process (once you've written one of these it's much easier to write others). In both cases, when using fast-rcnn you will also need to generate the selective search proposals for your images.
The xml format of the Pascal VOC dataset is shown at the bottom. I have a dataset where some frames do not have any objects at all. Would I still need to have all the sections filled up and if not then which ?
<annotation>
<folder>train</folder>
<filename>000001</filename>
<source>
<database>KITTI database</database>
</source>
<size>
<width>1224</width>
<height>370</height>
<depth>3</depth>
</size>
<segment>0</segment>
<object>
<name>Pedestrian</name>
<pose>unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>712.39999999999998</xmin>
<ymin>143</ymin>
<xmax>810.73000000000002</xmax>
<ymax>307.92000000000002</ymax>
</bndbox>
</object>
</annotation>
<annotation>
<folder>VOC2007</folder>
<filename>000002.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>329145082</flickrid>
</source>
<owner>
<flickrid>hiromori2</flickrid>
<name>Hiroyuki Mori</name>
</owner>
<size>
<width>335</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>train</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>139</xmin>
<ymin>200</ymin>
<xmax>207</xmax>
<ymax>301</ymax>
</bndbox>
</object>
</annotation>
@HaziqRazali for images with no annotation the format would be the same but there would be no <object> ... </object>
tags.
During training you have a couple of options for dealing with images without annotations. The simplest would be to add an extra flag in the imdb to only use images with annotations during training time (i.e. so that getBatch
only loads images with annotations). An alternative approach, which is used in the caffe implementation of SSD, is to load images regardless of whether or not they have an annotation, but have an ignore
label which is processed by the network itself (so that it doesn't contribute to the loss).
@albanie Thanks. Now I've prepared my dataset. And where to place the selective search code?
I've been successfully run the fast-rcnn demo with fast_rcnn_demp.m, but how can I train my own dataset because my detection task is based on other objects? To be concrete, how should i modify the .m file in fast_rcnn folder?