Open jihoo-kim opened 5 months ago
Hey thanks for the suggestion, this is quite exciting. I've been looking for something like this for a while.
I've tried it yesterday and I ran into some issues:
In general, I would really appreciate if we could have an example with Llama 3.
Thanks for accepting my suggestion and trying it. @mlabonne
Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results
Could you tell me what script you ran it with? If you specify the output_path
argument, the results would be saved on the disk. The default values of output_path
is the directory where evalverse is placed.
Please try again with your own output_path
.
CLI
python3 evaluator.py \
--ckpt_path {your_model} \
--mt_bench \
--num_gpus_total 8 \
--parallel_api 4 \
--output_path {your_path}
Library
import evalverse as ev
evaluator = ev.Evaluator()
evaluator.run(
model={your_model},
benchmark="mt_bench",
num_gpus_total=8,
parallel_api=4,
output_path={your_path}
)
I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template
I will fix it as soon as possible and let you know again. Thank you!
Suggest for LLM Evaluation Tutorials with
Evalverse