rbgirshick / py-faster-rcnn

Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
Other
8.13k stars 4.11k forks source link

Learning faster RCNN #274

Open wzleong opened 8 years ago

wzleong commented 8 years ago

Hi

I am new to Caffe. I am interested in applying faster RCNN on face detection applications. I am curious to know in the faster RCNN model how is the bbox (bounding box coordinate being applied and used to train the model)? For example, the Pascal VOC dataset is in XML format and the bbox coordinates are supplied. But how are this coordinates being fed into the network and trained?

Thanks WZ

GBJim commented 8 years ago

Hi @wzleong

Faster R-CNN uses the imdb class to parse the annotations and read images. Based this imdb class, you can customize scripts to deal with different kinds of data

You can refer to many related issues for more details: For example: #243

EloiZ commented 8 years ago

Be careful if your coordinates start at 0 or 1. The standard in computer vision is to specify the top left corner and the bottom right corner. The coordinates are parsed by (for example coco.py) in the function _load_coco_annotation (see https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/datasets/coco.py#L228)

wzleong commented 8 years ago

Hi @GBJim @EloiZ Thanks for your advice. Will take note of them. But I am still a bit lost is in the Caffe repo example, they advised to convert the dataset to lmdb format. lmdb and imdb are they the same? From my understanding we usually train in batch. So our training image will be the input of our network and the annotations will be our ground truth to calculate the lost for back-propagation. But do we input the images as jpeg for imdb? I presume there is some processing on the images? I believe we need the factory.py and imdb.py to prepare the database right?

Thanks Wei Zhen

GBJim commented 8 years ago

@wzleong The imdb class and the lmdb data base are TOTALLY DIFFERENT stuff.

lmdb stands for "Lightning Memory-Mapped Database". In Faster R-CNN, we are using the imdb class as a customized Python Layer in Caffe. You can write anything you need to deal with the data in your own imdb class and feed the data into Faster R-CNN.