huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
769 stars 88 forks source link

What is `qem` for gsm8k evaluation? #238

Closed shizhediao closed 3 months ago

shizhediao commented 3 months ago

As titled.

Thank you!

clefourrier commented 3 months ago

Hi! Quasi exact match! (Fraction of instances where the normalized prediction matches the normalized gold (normalization done on whitespace, articles, capitalization, ...). )

shizhediao commented 3 months ago

Thank you very much for your prompt reply!

shizhediao commented 2 months ago

Hi,

A quick question, what is the difference between exact_match and quasi exact match? Does that meanqem` did another step of normalization?

Thanks!

clefourrier commented 1 month ago

Hi! You'll find all the metrics explanations available in the readme here. Indeed, qem applies post processing to the results, where the exact match does not.

shizhediao commented 1 month ago

Thank you!