salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
10.01k stars 975 forks source link

BLIP2 feature extracter/retrieval #98

Closed jn2clark closed 1 year ago

jn2clark commented 1 year ago

Hi! A couple of questions: (1) What is the best way to use blip2 as a feature extractor for image-text retrieval? I did not see the same interface for blip2 here as the original blip. (2) Are there any metrics for single stage retrieval (text-image) for blip2 without using fusion encoder reranking?

Thanks!

LiJunnan1992 commented 1 year ago

Hi @jn2clark, thanks for your questions.

  1. We will be implementing a feature extract interface for blip2.
  2. We haven't evaluated the single-stage retrieval based on contrastive similarity alone, you are more than welcomed to give it a go. Just modify this function to directly use sims_matrix as the score: https://github.com/salesforce/LAVIS/blob/5ddd9b4e5149dbc514e81110e03d28458a754c5d/lavis/models/blip2_models/blip2.py#L98