issues
search
princeton-nlp
/
HELMET
The HELMET Benchmark
https://arxiv.org/abs/2410.02694
MIT License
51
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Reproducing results on Llama-3.1-8B-Inst
#8
chtmp223
opened
3 days ago
3
ALCE citation evaluation
#7
carriex
opened
3 days ago
3
Discrepancy in gpt4o-mini Results on MSMarco Compared to Reported Results
#6
8188zq
opened
3 days ago
7
fix: add trust_remote_code for banking77
#5
Wangmerlyn
closed
1 week ago
1
Is the code of eval_gpt4_longqa.sh is correct?
#4
enze5088
closed
1 week ago
1
Making the full eval sheet a read-only excel
#3
LeoXinhaoLee
closed
1 week ago
1
Download data link
#2
maxjeblick
closed
2 weeks ago
1
Missing requirements
#1
maxjeblick
closed
2 weeks ago
1