issues
search
princeton-nlp
/
HELMET
The HELMET Benchmark
https://arxiv.org/abs/2410.02694
MIT License
75
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to evaluate the base models?
#10
canghongjian
closed
1 week ago
2
Evaluation with larger batch size
#9
prakamya-mishra
closed
2 weeks ago
1
Reproducing results on Llama-3.1-8B-Inst
#8
chtmp223
opened
1 month ago
3
ALCE citation evaluation
#7
carriex
opened
1 month ago
3
Discrepancy in gpt4o-mini Results on MSMarco Compared to Reported Results
#6
8188zq
opened
1 month ago
9
fix: add trust_remote_code for banking77
#5
Wangmerlyn
closed
1 month ago
1
Is the code of eval_gpt4_longqa.sh is correct?
#4
enze5088
closed
1 month ago
1
Making the full eval sheet a read-only excel
#3
LeoXinhaoLee
closed
1 month ago
1
Download data link
#2
maxjeblick
closed
1 month ago
1
Missing requirements
#1
maxjeblick
closed
1 month ago
1