Closed parkan closed 8 years ago
Right, region predictions can be disregarded. Idea here is to use a good (image -> text) model to generate additional descriptive text which the original metadata may have left out.
Another approach to help boost the recall is (1) do the text-based search (2) find additional matches by doing content-based matching between the hits from the previous step, and the rest of the DB. This would be particularly useful in the case where we have some datasets with more metadata than others.
Is this still relevant?
Ok this is covered through clarifai for the moment
For ex. http://cs.stanford.edu/people/karpathy/densecap/
[Note from @parkan: it seems like dense captioning isn't QUITE the problem we want, because we don't really need labels applied to image regions?]