huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
471 stars 55 forks source link

Enable majority voting for GSM8k / MATH #62

Closed lewtun closed 2 months ago

lewtun commented 4 months ago

Many papers nowadays report the maj@k metric for math benchmarks like GMS8k and MATH, where the model generates k candidates to a problem and the most common answer is chosen as the final solution (see source paper for details).

It would be nice to support maj@k as a metric for these benchmark, potentially also including the ability to have CoT prompts as is also common practice.

clefourrier commented 4 months ago

CoT prompts are on the roadmap :)

clefourrier commented 2 months ago

It's been done and merged in #158