Open ravissj4 opened 5 years ago
You need to extract the features as in the paper: Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome, then you can write a similar file to generate captions for raw images outside the coco dataset.
Can the DataLoaderRaw be used for this above purpose ? Sorry, if this is a very trivial question, I'm very new to this.
This part is written by ruotian luo, you can go to his github to learn how to use.
You need to extract the features as in the paper: Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome, then you can write a similar file to generate captions for raw images outside the coco dataset.