[Feature] Loading the same model multiple times in opencompass for evaluation on MMLU dataset

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Apache License 2.0

4.1k stars 436 forks source link

Describe the feature

I am currently using LLaMa to evaluate PPL's ACC on MMLU dataset and I have a question about opencompass toolkit. When I use opencompass to divide the evaluation task into 40 sub-tasks, does this mean that the toolkit loads the same model 40 times for evaluation? From my observation, the time it takes to load the model is equivalent to the inference time. Therefore, I am concerned that the evaluation may take a long time due to the repeated loading of the model.

Additionally, I am wondering if it is possible to configure the opencompass config to support batch inference? This could potentially improve the efficiency of the evaluation process.

I would greatly appreciate it if you could provide detailed guidance on these issues.

Will you implement it?

[ ] I would like to implement this feature and create a PR!

It's a good catch. OpenCompass's task division system is designed for cluster management system like Slurm, which would dispatch tasks to different nodes for parallel evaluation. However, it can hamper the evaluation process if it is just running on a single node, since each task requires a complete reloading of the weights. The simplest way is to increase the task size to reduce the number of tasks; or you may divide deeper into the docs about Partitioner and switch the strategy to NaivePartitioner (https://opencompass.readthedocs.io/en/latest/user_guides/evaluation.html#task-partition-partitioner).

Batch inference is natively supported, you can specify batch_size in model's config, as already mentioned in Getting Started.

open-compass / opencompass

[Feature] Loading the same model multiple times in opencompass for evaluation on MMLU dataset #117

Describe the feature

Will you implement it?