Closed JoonseoKang closed 4 years ago
Currently we only support to inference with image features. You can use the bottom-up top-down approach to extract features and labels first and then use our pipeline to generate the captions.
Hey @xiyinmsu Could the following repo work for feature extraction https://github.com/airsplay/py-bottom-up-attention?
In coco_caption dataset, train.yaml file shows that train.img.tsv is an image, but i couldn`t found train.img.tsv.
feature: train.feature.tsv
What I want to do is look at the image, caption example, like in your paper Fig5.