open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.7k stars 395 forks source link

[Feature] Difficulty in Evaluating Custom Models with OpenCompass #1239

Open jiangjiadi opened 2 months ago

jiangjiadi commented 2 months ago

Describe the feature

Dear OpenCompass Team,

I've encountered a challenge with OpenCompass when trying to evaluate a custom model that I developed. Currently, it seems that any action I want to perform with OpenCompass must be executed within the OpenCompass repository itself, which can be quite difficult.

My main concern is that OpenCompass is not designed to facilitate executions in external repositories. Typically, model development is done within personal repositories, and it would be impractical to rewrite the model in the OpenCompass repository.

I would like to suggest considering an approach similar to that of fairseq, which uses a registration mechanism to manage models and datasets. This could make it easier for users to work with their custom models. Besides, instead of running an evaluation command inside OpenCompass like

python run.py --dataset ... --models ...

it would be more user-friendly to execute a command within the user's own repository, such as

opencompass --dataset ... --models ....

This change could significantly improve the user experience by streamlining the process of model evaluation.

Thank you for considering this suggestion. I am looking forward to your response and any potential updates regarding this matter.

Best regards, Jiadi Jiang

Will you implement it?

tonysy commented 2 months ago

Thanks for the suggesiton, we will provide cli command in next version