refacto judge and add mixeval

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

MIT License

832 stars 99 forks source link

Closed NathanHB closed 1 month ago

NathanHB commented 1 month ago

What this PR does:

[x] Adds MixEval task
- add a mixeval judge as a sample metric using the new LLMJudge metric
[x] refactor the judge metric
- easier to define judges for custom tasks
[x] now batches the model restuls per tasks and then per metric type to be computed in batch (does not change anything for tasks other than llm as judge which is now much faster)