tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
638 stars 41 forks source link

Which metrics is chosen in the leaderboard? #21

Open zhimin-z opened 7 months ago

zhimin-z commented 7 months ago

http://openeval.org.cn/doc?id=2 image I cannot find any further explanation of the chosen metrics.

zhimin-z commented 7 months ago

Especially, which BLEU is used? 1, 2, 3, or 4? Which Rouge is used? L, W, S, 1, or SU?@allen3ai @john-b-yang @ikergarcia1996 @eltociear @BinWang28 Thanks in advance!