mtbench101 / mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Apache License 2.0
38 stars 6 forks source link

Question about the evaluation code. #9

Open HYeCao opened 2 weeks ago

HYeCao commented 2 weeks ago

Could you provide the evaluation python files? like "run.py"

shamy1997 commented 1 week ago

same question~

sefira commented 1 week ago

@HYeCao @shamy1997

Hi,

As we have integrated our MT-Bench-101 benchmark into our forked OpenCompass. The evaluation python code can be found in opencompass/run.py.

You can follow the Installation Section of the README.md.

Thank you.

HYeCao commented 5 days ago

image I have follow the instruction. but I meet these errors.

sefira commented 5 days ago

image I have follow the instruction. but I meet these errors.

It seems you encounter some bugs with your Env and OpenCompass. Please refer to the official issue record of OpenCompass. https://github.com/open-compass/opencompass/issues/1040 https://github.com/open-compass/opencompass/issues/1168

And you can do some searching and debugging by yourself, have a good journey.

Best,