open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.06k stars 429 forks source link

关于mmbench评测集给mllm的prompt是什么? #164

Closed WanJJJh closed 1 year ago

WanJJJh commented 1 year ago

Describe the bug

prompt是采用multimodal/models中的minigpt4和instructblip中的样例进行prompt吗?还是有设计prompt,我是用样例的简单prompt复现mplug时远不及论文中的验证集的49%

'''python

    img_prompt = '###Human: <Img><ImageHere></Img> '
    if 'context' in samples:
        context_prompt = samples['context'][0]
    question = samples['question']
    options = samples['options']
    if 'context' in samples:
        prompt = img_prompt + ' ' + context_prompt + ' ' + question + ' ' + options  # noqa
    else:
        prompt = img_prompt + ' ' + question + ' ' + options

    # prompt = self.sys_prompt + prompt
    prompt = prompt + '###Assistant:''''

Environment

python

Other information

No response

YuanLiuuuuuu commented 1 year ago

Thank you for your interest in MMBench. In our demo, we only provide a minimum version of the prompt when inferencing on MMBench. As for a specific model, you should refer to the prompt it uses in its official repo. For example, mPLUG-owl uses The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <image> Human: {text_input} AI:. The text_input should be <your question> + There are several options: A. B. C. D.

YuanLiuuuuuu commented 1 year ago

Currently, we do not enforce any constrains on the prompt a model can use.

WanJJJh commented 1 year ago

好的好的,谢谢您的回复

WanJJJh commented 1 year ago

Thank you for your interest in MMBench. In our demo, we only provide a minimum version of the prompt when inferencing on MMBench. As for a specific model, you should refer to the prompt it uses in its official repo. For example, mPLUG-owl uses The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <image> Human: {text_input} AI:. The text_input should be <your question> + There are several options: A. B. C. D.

抱歉再打扰您,经过用新的prompt复现了mplug结果后和leaderboard差距依然很大,注意到这里的prompt。请问prompt中的 是否需要包含 context(数据集中的hint),minigpt和blip2示例的代码是 context+question

YuanLiuuuuuu commented 1 year ago

Yes, you should include hint in your prompt

MAGAer13 commented 1 year ago

Thank you for your interest in MMBench. In our demo, we only provide a minimum version of the prompt when inferencing on MMBench. As for a specific model, you should refer to the prompt it uses in its official repo. For example, mPLUG-owl uses The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <image> Human: {text_input} AI:. The text_input should be <your question> + There are several options: A. B. C. D.

抱歉再打扰您,经过用新的prompt复现了mplug结果后和leaderboard差距依然很大,注意到这里的prompt。请问prompt中的 是否需要包含 context(数据集中的hint),minigpt和blip2示例的代码是 context+question

We use the latest version of mPLUG-Owl which incorporates some new pre-training tasks, and it will be released in the next few days . Stay tuned.

WanJJJh commented 1 year ago

42.6

Thank you for your interest in MMBench. In our demo, we only provide a minimum version of the prompt when inferencing on MMBench. As for a specific model, you should refer to the prompt it uses in its official repo. For example, mPLUG-owl uses The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <image> Human: {text_input} AI:. The text_input should be <your question> + There are several options: A. B. C. D.

抱歉再打扰您,经过用新的prompt复现了mplug结果后和leaderboard差距依然很大,注意到这里的prompt。请问prompt中的 是否需要包含 context(数据集中的hint),minigpt和blip2示例的代码是 context+question

We use the latest version of mPLUG-Owl which incorporates some new pre-training tasks, and it will be released in the next few days . Stay tuned.

请问您的意思是mmbench论文中的结果以及现在官方的dev榜单都是最新的版本的结果吗? 由于我只有dev的gt,我只能复现dev的结果

YuanLiuuuuuu commented 1 year ago

42.6

Thank you for your interest in MMBench. In our demo, we only provide a minimum version of the prompt when inferencing on MMBench. As for a specific model, you should refer to the prompt it uses in its official repo. For example, mPLUG-owl uses The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <image> Human: {text_input} AI:. The text_input should be <your question> + There are several options: A. B. C. D.

抱歉再打扰您,经过用新的prompt复现了mplug结果后和leaderboard差距依然很大,注意到这里的prompt。请问prompt中的 是否需要包含 context(数据集中的hint),minigpt和blip2示例的代码是 context+question

We use the latest version of mPLUG-Owl which incorporates some new pre-training tasks, and it will be released in the next few days . Stay tuned.

请问您的意思是mmbench论文中的结果以及现在官方的dev榜单都是最新的版本的结果吗? 由于我只有dev的gt,我只能复现dev的结果

  1. Result in the paper was reported by weight of previous version, and that in the leaderboard uses the latest weight.
  2. You can use the evaluation server here to report the accuracy on the test split
WanJJJh commented 1 year ago

抱歉再问一下,你们评测的mplug是huggingface中官方的[mplug-owl-llama-7b]还是mplug-owl-bloomz-7b-multilingual这个版本,谢谢

YuanLiuuuuuu commented 1 year ago

抱歉再问一下,你们评测的mplug是huggingface中官方的[mplug-owl-llama-7b]还是mplug-owl-bloomz-7b-multilingual这个版本,谢谢 The results in the paper are reported using mplug-owl-llama-7b

WanJJJh commented 1 year ago

好的,感谢解答