lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
606 stars 71 forks source link

[Feature] support arena-hard in opencompass #13

Closed bittersweet1999 closed 2 months ago

bittersweet1999 commented 5 months ago

Hi, Thanks for such a robust work! We have supported ArenaHard dataset in Opencompass now, OpenCompass is an evaluation platform that can partition tasks and support different model inference backends, thereby accelerating the model evaluation process. After integrating the advantages of your datasets and OpenCompass, it is now possible to directly select a model to perform rapid inference and evaluation in one step. Besides, opencompass also support to change judge model or set multi-judge models. The demo config in Opencompass is here: https://github.com/open-compass/opencompass/blob/main/configs/eval_subjective_arena_hard.py Welcome to try in Opencompass and we can further collaborate to strengthen the LLM evaluation work of the open-source community.

CodingWithTim commented 5 months ago

Hi OpenCompass team,

You guys are also great contributors to the community! What are somethings you guys are looking for from us?

bittersweet1999 commented 5 months ago

Thank you for your attention. Would you like to update the README with this news? We hope OpenCompass will improve the visibility of this great work. By the way, we are very interested in your progress on model evaluation. In fact, in addition to ArenaHard, we have also supported MTBench, and we will continue to pay attention to your new datasets or model evaluation methods and support them in OpenCompass.