tjunlp-lab Awesome-LLMs-Evaluation-Papers issues

tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

635 stars 41 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add License

#29 haesleinhuepf opened 1 week ago
0
Add format-following evaluation paper

#28 OlivierBinette opened 1 month ago
0
The leaderboard is missing from the page...

#27 zhimin-z opened 4 months ago
0
Any paper or report for http://openeval.org.cn?

#26 zhimin-z opened 4 months ago
0
What is the provenance of WGlaw dataset?

#25 zhimin-z opened 6 months ago
0
The GitHub linkage of OpenEval is missing.

#24 zhimin-z opened 6 months ago
0
How many shots are used to evaluate the benchmarks in OpenEval?

#23 zhimin-z opened 6 months ago
0
Added 'Critical Thinking for Language Models' 2020

#22 ggbetz opened 7 months ago
0
Which metrics is chosen in the leaderboard?

#21 zhimin-z opened 7 months ago
1
add openeval

#20 zhimin-z opened 7 months ago
2
add InstructEval

#19 zhimin-z closed 7 months ago
0
Could you add PandaLM to your survey?

#18 qianlanwyd opened 7 months ago
1
remove duplicate leaderboards

#17 zhimin-z closed 7 months ago
1
Update README.md

#16 john-b-yang closed 7 months ago
4
Update README.md

#15 BinWang28 closed 7 months ago
0
Why we list inaccessible benchmark?

#14 zhimin-z closed 6 months ago
3
Add more leaderboards

#13 zhimin-z closed 7 months ago
0
Update README.md

#12 eltociear closed 7 months ago
1
Code-Related Benchmarks

#11 john-b-yang closed 7 months ago
2
Can you add our recent work to your survey?

#10 grayground opened 8 months ago
1
Add SpyGame

#9 Skytliang opened 8 months ago
1
Add AI Liar paper

#8 LoryPack closed 8 months ago
1
SeaEval: Multilingual LLM Evaluation

#7 BinWang28 opened 8 months ago
7
Add MINT-Bench

#6 xingyaoww closed 8 months ago
1
Add related work

#5 ChanLiang closed 8 months ago
1
Negation Datasets

#4 ikergarcia1996 closed 8 months ago
1
Add "How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective." in Robustness Evaluation

#3 NTDXYG closed 8 months ago
1
Update README.md

#2 terryyz closed 8 months ago
1
RAGAS: Automated Evaluation of Retrieval Augmented Generation

#1 gdelpuente opened 8 months ago
2