About the model used for extracting candidate pairs

z-kun commented 6 years ago

Hello jpeyre, I have read your paper and program, it is a nice idea to import spatial features to visual relation detection.

After reading, I confuse the sentence "To detect and localize such triplets in test images, we assume that the candidate object detections for s and o are given by a de- tector trained with full supervision. Here we use the object detector Faster-RCNN [14] trained on the Visual Relationship Detection training set [31]." in Part 3 of this paper. But, in the section Representing Appearance of Objects, you use Fast-RCNN with VGG16 pre-trained on ImageNet to extract the appearance feature. So you mean that you use the same CNN structure(Fast-RCNN) trained on different datasets in these two different steps?

I just find the "vgg16_fast_rcnn.caffemodel" in the program, but do not find the model trained on Visual Relationship Dataset. I wonder if I misunderstand the paper. Could you tell me some details about the model trained on VRD used for extracting the candidate pairs of objects? Thank you!

jpeyre commented 6 years ago

Hi z-kun, We use the same model both for extracting the candidate objects and computing their appearance features. This model is indeed "vgg16_fast_rcnn.caffemodel", a VGG16 network pre-trained on ImageNet and finetuned on the VRD training set.

z-kun commented 6 years ago

get it, thanks!

jpeyre / unrel

About the model used for extracting candidate pairs #3