ronghanghu / speaker_follower

Code release for Fried et al., Speaker-Follower Models for Vision-and-Language Navigation. in NeurIPS, 2018.
http://ronghanghu.com/speaker_follower/
Other
129 stars 31 forks source link

Wonder about image feature #1

Closed yellowjs0304 closed 6 years ago

yellowjs0304 commented 6 years ago

Hi, I read your paper and code. I have a question about the image feature. Is the bottom-up attention applied to your image processing network? I assumed that your image processing has object detection, and relationship detection through the bottom-up attention, visual_genome dataset.

But, I couldn't find any info about these in your paper. Would you explain about this?

ronghanghu commented 6 years ago

No, all these other visual features are not used in our final model. Our model only uses ImageNet-pretrained ResNet-152 features as in the original R2R dataset paper (despite the interface for other features in our codebase).

yellowjs0304 commented 6 years ago

Thank you for your clear answer :)

yellowjs0304 commented 6 years ago

Sorry, I have an another question. Which image is used in visual feature for models? skybox image? panorama image?

ronghanghu commented 6 years ago

It is the same "skybox" image features as in the original R2R dataset repo:

https://storage.googleapis.com/bringmeaspoon/img_features/ResNet-152-imagenet.zip

yellowjs0304 commented 6 years ago

Thank u for reply.