Open 664852049 opened 8 years ago
We don't have code for that in this repo, but it's relatively simple.
First use the extractFeatures method to get boxes and features for the database images:
https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285
Next run the LanguageModel (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285) and LanguageModelCriterion (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L120) forward using the features and the query to compute the log-likelihood for the query on the extracted boxes.
Finally use these log-likelihoods to sort all boxes on all images.
@jcjohnson can open-world object detection be done in a similar way?
Also, how do you use the boxes and features from extractFeatures
in self.nets.language_model:forward()
(the function expects image vectors and gt labels) and self.crits.lm_crit:forward()
?
Has anyone got a working demo of this by chance?
In your paper,your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. How can I do the retrieval work?