Add mt-bench - Githubissues

NathanHB commented 4 months ago

What this PR does:

Uses custom metrics and tasks to add llm a as judge
adds multi turn generation
Adds mt-bench metric

This implementation uses mt-bench prompts from InflectionAI. The code is inspired from the original implementation of mt-bench with notable differences.

mt-bench uses a custom-made chat templating system, we use the tokenizer
mt-bench uses an old version of the openai API, we use the newest one, with very simplified logic for chat prompt formating. We can easily add more models to act as judge.
We do not use varying temperature based on the sample we are evaluating. All samples are generated using do_sample=False and temperature set to 0.0.

clefourrier commented 3 months ago

@NathanHB feel free to ping me once it's merged with main so we can integrate it :)

clefourrier commented 3 months ago

Careful with the deletion of task_examples, quite sure some of these files are needed by the nanotron team.

huggingface / lighteval

Add mt-bench #75