Question about the evaluation code.

mtbench101 / mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

Apache License 2.0

38 stars 6 forks source link

Open HYeCao opened 2 weeks ago

HYeCao commented 2 weeks ago

Could you provide the evaluation python files? like "run.py"

shamy1997 commented 1 week ago

same question～

sefira commented 1 week ago

@HYeCao @shamy1997

Hi,

As we have integrated our MT-Bench-101 benchmark into our forked OpenCompass. The evaluation python code can be found in opencompass/run.py.

You can follow the Installation Section of the README.md.

Thank you.

HYeCao commented 5 days ago

I have follow the instruction. but I meet these errors.

sefira commented 5 days ago

I have follow the instruction. but I meet these errors.

It seems you encounter some bugs with your Env and OpenCompass. Please refer to the official issue record of OpenCompass. https://github.com/open-compass/opencompass/issues/1040 https://github.com/open-compass/opencompass/issues/1168

And you can do some searching and debugging by yourself, have a good journey.

Best,