issues
search
tjunlp-lab
/
Awesome-LLMs-Evaluation-Papers
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
635
stars
41
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add License
#29
haesleinhuepf
opened
1 week ago
0
Add format-following evaluation paper
#28
OlivierBinette
opened
1 month ago
0
The leaderboard is missing from the page...
#27
zhimin-z
opened
4 months ago
0
Any paper or report for http://openeval.org.cn?
#26
zhimin-z
opened
4 months ago
0
What is the provenance of WGlaw dataset?
#25
zhimin-z
opened
6 months ago
0
The GitHub linkage of OpenEval is missing.
#24
zhimin-z
opened
6 months ago
0
How many shots are used to evaluate the benchmarks in OpenEval?
#23
zhimin-z
opened
6 months ago
0
Added 'Critical Thinking for Language Models' 2020
#22
ggbetz
opened
7 months ago
0
Which metrics is chosen in the leaderboard?
#21
zhimin-z
opened
7 months ago
1
add openeval
#20
zhimin-z
opened
7 months ago
2
add InstructEval
#19
zhimin-z
closed
7 months ago
0
Could you add PandaLM to your survey?
#18
qianlanwyd
opened
7 months ago
1
remove duplicate leaderboards
#17
zhimin-z
closed
7 months ago
1
Update README.md
#16
john-b-yang
closed
7 months ago
4
Update README.md
#15
BinWang28
closed
7 months ago
0
Why we list inaccessible benchmark?
#14
zhimin-z
closed
6 months ago
3
Add more leaderboards
#13
zhimin-z
closed
7 months ago
0
Update README.md
#12
eltociear
closed
7 months ago
1
Code-Related Benchmarks
#11
john-b-yang
closed
7 months ago
2
Can you add our recent work to your survey?
#10
grayground
opened
8 months ago
1
Add SpyGame
#9
Skytliang
opened
8 months ago
1
Add AI Liar paper
#8
LoryPack
closed
8 months ago
1
SeaEval: Multilingual LLM Evaluation
#7
BinWang28
opened
8 months ago
7
Add MINT-Bench
#6
xingyaoww
closed
8 months ago
1
Add related work
#5
ChanLiang
closed
8 months ago
1
Negation Datasets
#4
ikergarcia1996
closed
8 months ago
1
Add "How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective." in Robustness Evaluation
#3
NTDXYG
closed
8 months ago
1
Update README.md
#2
terryyz
closed
8 months ago
1
RAGAS: Automated Evaluation of Retrieval Augmented Generation
#1
gdelpuente
opened
8 months ago
2