Closed yellowjs0304 closed 6 years ago
No, all these other visual features are not used in our final model. Our model only uses ImageNet-pretrained ResNet-152 features as in the original R2R dataset paper (despite the interface for other features in our codebase).
Thank you for your clear answer :)
Sorry, I have an another question. Which image is used in visual feature for models? skybox image? panorama image?
It is the same "skybox" image features as in the original R2R dataset repo:
https://storage.googleapis.com/bringmeaspoon/img_features/ResNet-152-imagenet.zip
Thank u for reply.
Hi, I read your paper and code. I have a question about the image feature. Is the bottom-up attention applied to your image processing network? I assumed that your image processing has object detection, and relationship detection through the bottom-up attention, visual_genome dataset.
But, I couldn't find any info about these in your paper. Would you explain about this?