OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
I've encountered a challenge with OpenCompass when trying to evaluate a custom model that I developed. Currently, it seems that any action I want to perform with OpenCompass must be executed within the OpenCompass repository itself, which can be quite difficult.
My main concern is that OpenCompass is not designed to facilitate executions in external repositories. Typically, model development is done within personal repositories, and it would be impractical to rewrite the model in the OpenCompass repository.
I would like to suggest considering an approach similar to that of fairseq, which uses a registration mechanism to manage models and datasets. This could make it easier for users to work with their custom models. Besides, instead of running an evaluation command inside OpenCompass like
python run.py --dataset ... --models ...
it would be more user-friendly to execute a command within the user's own repository, such as
opencompass --dataset ... --models ....
This change could significantly improve the user experience by streamlining the process of model evaluation.
Thank you for considering this suggestion. I am looking forward to your response and any potential updates regarding this matter.
Best regards,
Jiadi Jiang
Will you implement it?
[ ] I would like to implement this feature and create a PR!
Describe the feature
Dear OpenCompass Team,
I've encountered a challenge with OpenCompass when trying to evaluate a custom model that I developed. Currently, it seems that any action I want to perform with OpenCompass must be executed within the OpenCompass repository itself, which can be quite difficult.
My main concern is that OpenCompass is not designed to facilitate executions in external repositories. Typically, model development is done within personal repositories, and it would be impractical to rewrite the model in the OpenCompass repository.
I would like to suggest considering an approach similar to that of fairseq, which uses a registration mechanism to manage models and datasets. This could make it easier for users to work with their custom models. Besides, instead of running an evaluation command inside OpenCompass like
it would be more user-friendly to execute a command within the user's own repository, such as
This change could significantly improve the user experience by streamlining the process of model evaluation.
Thank you for considering this suggestion. I am looking forward to your response and any potential updates regarding this matter.
Best regards, Jiadi Jiang
Will you implement it?