How to do few-shot in-context learning with BLIP2/InstructBLIP?

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD 3-Clause "New" or "Revised" License

9.92k stars 972 forks source link

How to do few-shot in-context learning with BLIP2/InstructBLIP? #526

Open ys-zong opened 1 year ago

ys-zong commented 1 year ago

Hi, thanks for the great work!

I wonder how can I prompt BLIP2 and InstructBLIP to do few-shot in-context learning, e.g. few-shot VQA. Specifically, I want to have the input like [Img] [QA1] [Img] [QA2] ... [Img][Qn] --> Answer.

I saw this issue #433 about how to prompt with <Img, Q1, A1, Q2, ?>. So the difference here is how can I input multiple images? Many thanks!

jeeyung commented 1 year ago

I'm also wondering if we can feed multiple images

Fym68 commented 7 months ago

Have you found a method? I also want to do a few shot experiment.

ys-zong commented 7 months ago

FYI. I didn't find a neat way for few-shot BLIP, but I implemented the few-shot inference of many other V-L models here: https://github.com/ys-zong/VL-ICL