What is the estiamted runtime across benchmarks, and OpenAI api cost?

Hi, @findalexli ,

The runtime depends on the architecture of your VLM and may vary across different VLMs. To demonstrate a data point, evaluating llava-v1.5-7b on MMBench_DEV_EN may cost 15 minutes on a single A100.
The OpenAI API cost depends on the nature of the task and the instruction following capability of your model (after all, if an VLM follows the instruction perfectly, no GPT post-processing is required). For some data points, evaluating different VLMs on MMBench_EN (dev + test) will cost < $1.5 on average. For models like llava-v1.5-7b, XComposer, no GPT cost will be spent during evaluation.

open-compass / VLMEvalKit