Add HumanEval and HumanEval+

huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

MIT License

471 stars 55 forks source link

Add HumanEval and HumanEval+ #63

Open lewtun opened 4 months ago

lewtun commented 4 months ago

The HumanEval and HumanEval+ benchmarks are stables for benchmarking code capabilities of base LLMs. It would be nice to include them in lighteval so one doesn't have to switch to another framework like BigCode's

References:

HumanEval: https://github.com/openai/human-eval
HumanEval+: https://arxiv.org/abs/2305.01210
Implementation: https://github.com/evalplus/evalplus?tab=readme-ov-file
BigCode eval harness: https://github.com/bigcode-project/bigcode-evaluation-harness/tree/main

0-hero commented 3 months ago

+1, would be nice to have