ys-zong / VL-ICL

Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
https://ys-zong.github.io/VL-ICL/
29 stars 2 forks source link

about open_mi #3

Closed whitesockcat closed 3 months ago

whitesockcat commented 7 months ago

Here is my prompt

Induce the concept from the in-context examples. Answer the question with a single word or phase. We name this is a slation We name this is a dax Based on the in-context examples, we name this is a

gt: blicket

model: School bus

But if i change the last part of question to "Can you name this is a blicket or a dax?" The Model will say "blicket".

I want to know is this prompt ok for VL-ICL benchmark?

whitesockcat commented 7 months ago

There is natural knowlege in MLLM, so maybe different MLLM have different instruction following ability. As I mention above, my model is more stubborn. But I can't say it can't do in-contex learning, because it can answer right with explicit prompt.

Want to know your opinion about prompt.

ys-zong commented 7 months ago

Hi, I wonder which model are you using? Is it instruction-following fine-tuned?

The two prompts you used are "Based on the in-context examples, we name this is a X", and "Can you name this is a blicket or a dax?" The first prompt is more difficult for the model to answer because it's open-set compared to the second prompt, as you have explicitly give the two options. So, you may consider it as two variants (difficulty levels) of open_mi, open-set and closed-set, and the model can't do open-set but can do closed-set. This is also a type of ICL but maybe the model needs to be stronger before it can do the open-set one.

whitesockcat commented 7 months ago

Thanks for your reply. Our models instruction following ability is weak. We should do something aout this.