Open ys-zong opened 1 year ago
I'm also wondering if we can feed multiple images
Have you found a method? I also want to do a few shot experiment.
FYI. I didn't find a neat way for few-shot BLIP, but I implemented the few-shot inference of many other V-L models here: https://github.com/ys-zong/VL-ICL
Hi, thanks for the great work!
I wonder how can I prompt BLIP2 and InstructBLIP to do few-shot in-context learning, e.g. few-shot VQA. Specifically, I want to have the input like
[Img] [QA1] [Img] [QA2] ... [Img][Qn]
--> Answer.I saw this issue #433 about how to prompt with <Img, Q1, A1, Q2, ?>. So the difference here is how can I input multiple images? Many thanks!