Open wzleong opened 8 years ago
Be careful if your coordinates start at 0 or 1.
The standard in computer vision is to specify the top left corner and the bottom right corner.
The coordinates are parsed by
Hi @GBJim @EloiZ Thanks for your advice. Will take note of them. But I am still a bit lost is in the Caffe repo example, they advised to convert the dataset to lmdb format. lmdb and imdb are they the same? From my understanding we usually train in batch. So our training image will be the input of our network and the annotations will be our ground truth to calculate the lost for back-propagation. But do we input the images as jpeg for imdb? I presume there is some processing on the images? I believe we need the factory.py and imdb.py to prepare the database right?
Thanks Wei Zhen
@wzleong The imdb class and the lmdb data base are TOTALLY DIFFERENT stuff.
lmdb stands for "Lightning Memory-Mapped Database". In Faster R-CNN, we are using the imdb class as a customized Python Layer in Caffe. You can write anything you need to deal with the data in your own imdb class and feed the data into Faster R-CNN.
Hi
I am new to Caffe. I am interested in applying faster RCNN on face detection applications. I am curious to know in the faster RCNN model how is the bbox (bounding box coordinate being applied and used to train the model)? For example, the Pascal VOC dataset is in XML format and the bbox coordinates are supplied. But how are this coordinates being fed into the network and trained?
Thanks WZ