Access to per-sample evaluation results

adhirajghosh commented 6 months ago

Hi, Thanks for the great work! For my current project, I am looking to use the sample-wise evaluation results of VLMs for the experiments you have conducted.

If you can provide me with the sample-wise evaluation logs on the multimodal datasets mentioned(VQAv2, NoCaps, MMMU, MathVista, AI2D, ChartQA, ScienceQA) for the models evaluated(BLIP2, LLaVA Qwen-VL, Qwen-VL-Chat, InternLM-XComposer2-VL, GPT-4v, Gemini Pro Vision, Qwen-VL-Max, Qwen-VL-Plus), I would greatly appreciate it. If I missed a dataset or model, please feel free to incorporate them.

MingxuanXia commented 6 months ago

Hi, I'm sorry to tell you that we cannot provide the sample-wise evaluation logs for you.

github-actions[bot] commented 4 months ago

Stale issue message

microsoft / promptbench

Access to per-sample evaluation results #64