mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
https://mlabonne.github.io/blog/
Apache License 2.0
39.18k stars 4.14k forks source link

LLM Evaluation Tutorials with Evalverse #76

Open jihoo-kim opened 5 months ago

jihoo-kim commented 5 months ago

Suggest for LLM Evaluation Tutorials with Evalverse

image

mlabonne commented 5 months ago

Hey thanks for the suggestion, this is quite exciting. I've been looking for something like this for a while.

I've tried it yesterday and I ran into some issues:

In general, I would really appreciate if we could have an example with Llama 3.

jihoo-kim commented 5 months ago

Thanks for accepting my suggestion and trying it. @mlabonne

Issue 1

Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results

Could you tell me what script you ran it with? If you specify the output_path argument, the results would be saved on the disk. The default values of output_path is the directory where evalverse is placed.

Please try again with your own output_path.

CLI

python3 evaluator.py \
    --ckpt_path {your_model} \
    --mt_bench \
    --num_gpus_total 8 \
    --parallel_api 4 \
    --output_path {your_path}

Library

import evalverse as ev

evaluator = ev.Evaluator()
evaluator.run(
    model={your_model},
    benchmark="mt_bench",
    num_gpus_total=8,
    parallel_api=4,
    output_path={your_path}
)

Issue 2

I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

I will fix it as soon as possible and let you know again. Thank you!