jcjohnson / densecap

Dense image captioning in Torch
MIT License
1.58k stars 429 forks source link

How can I use natural language queries to retrieve the source image? #19

Open 664852049 opened 8 years ago

664852049 commented 8 years ago

In your paper,your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. How can I do the retrieval work?

jcjohnson commented 8 years ago

We don't have code for that in this repo, but it's relatively simple.

First use the extractFeatures method to get boxes and features for the database images:

https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285

Next run the LanguageModel (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285) and LanguageModelCriterion (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L120) forward using the features and the query to compute the log-likelihood for the query on the extracted boxes.

Finally use these log-likelihoods to sort all boxes on all images.

MohitShridhar commented 8 years ago

@jcjohnson can open-world object detection be done in a similar way? Also, how do you use the boxes and features from extractFeatures in self.nets.language_model:forward() (the function expects image vectors and gt labels) and self.crits.lm_crit:forward()?

brannondorsey commented 7 years ago

Has anyone got a working demo of this by chance?